{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Entrada e saída de dados\n",
    "\n",
    "Este notebook conterá nossas referências sobre entrada e saída de dados. O pandas pode ler uma variedade de tipos de arquivos usando seus métodos pd.read_. Vejamos os tipos de dados mais comuns:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[33mDEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621\u001b[0m\u001b[33m\n",
      "\u001b[0mRequirement already satisfied: numpy in /usr/local/lib/python3.8/site-packages (1.24.2)\n",
      "\u001b[33mDEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621\u001b[0m\u001b[33m\n",
      "\u001b[0m"
     ]
    }
   ],
   "source": [
    "# Instale um pacote pip no kernel atual do Jupyter \n",
    "import  sys \n",
    "!{ sys.executable } -m pip install numpy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## CSV\n",
    "\n",
    "### CSV Input"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>12</td>\n",
       "      <td>13</td>\n",
       "      <td>14</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    a   b   c   d\n",
       "0   0   1   2   3\n",
       "1   4   5   6   7\n",
       "2   8   9  10  11\n",
       "3  12  13  14  15"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('exemplo.csv')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Saída de dados tipo CSV "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.to_csv('exemplo_csv_saida.csv',index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Excel\n",
    "\n",
    "Pandas podem ler e escrever arquivos do Excel, tenha em mente, isso só importa dados. Não fórmulas nem imagens, lembrando que imagens ou macros podem bugar o método."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Entrada via Excel"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>index</th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>12</td>\n",
       "      <td>13</td>\n",
       "      <td>14</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   index   a   b   c   d\n",
       "0      0   0   1   2   3\n",
       "1      1   4   5   6   7\n",
       "2      2   8   9  10  11\n",
       "3      3  12  13  14  15"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.read_excel('Exemplo_Excel.xlsx',sheet_name='Sheet1')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Saída via Excel"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.to_excel('Exemplo_Excel_Saida.xlsx', sheet_name='Sheet1')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## HTML\n",
    "\n",
    "Você pode precisar instalar htmllib5, lxml e BeautifulSoup4. No seu terminal / prompt de comando, execute:\n",
    "\n",
    "    conda install lxml\n",
    "    conda install html5lib\n",
    "    conda install BeautifulSoup4\n",
    "    \n",
    "Em seguida, reinicie o Jupyter Notebook. \n",
    "(Ou use instalação de pip se não estiver usando a Distribuição de Anaconda)\n",
    "Pandas podem ler guias de tabelas fora de html. Por exemplo:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Entrada HTML\n",
    "\n",
    "A função Pandas read_html irá ler tabelas fora de uma página da Web e retornar uma lista de objetos DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "#df = pd.read_html('http://www.fdic.gov/bank/individual/failed/banklist.html')\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from unicodedata import normalize\n",
    "#table_MN = pd.read_html('https://en.wikipedia.org/wiki/Minnesota', match=\"Country of origin of first and second-generation immigrants (2023)\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "tabs = pd.read_html(\"https://www.runnersworld.com/races-places/a20823734/these-are-the-worlds-fastest-marathoners-and-marathon-courses/\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "8"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(tabs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Runner</td>\n",
       "      <td>Finish Time</td>\n",
       "      <td>Pace/Mile</td>\n",
       "      <td>Marathon</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Kelvin Kiptum (Kenya)</td>\n",
       "      <td>2:00:35</td>\n",
       "      <td>4:36.0</td>\n",
       "      <td>Chicago, 2023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Eliud Kipchoge (Kenya)</td>\n",
       "      <td>2:01:09</td>\n",
       "      <td>4:37.2</td>\n",
       "      <td>Berlin, 2022</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Kenenisa Bekele (Ethiopia)</td>\n",
       "      <td>2:01:41</td>\n",
       "      <td>4:38.5</td>\n",
       "      <td>Berlin, 2019</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Sisay Lemma (Ethiopia)</td>\n",
       "      <td>2:01:48</td>\n",
       "      <td>4:38.7</td>\n",
       "      <td>Valencia, 2023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Birhanu Legese (Ethiopia)</td>\n",
       "      <td>2:02:48</td>\n",
       "      <td>4:41.0</td>\n",
       "      <td>Berlin, 2019</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Mosinet Geremew (Ethiopia)</td>\n",
       "      <td>2:02:55</td>\n",
       "      <td>4:41.3</td>\n",
       "      <td>London, 2019</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Dennis Kimetto (Kenya)</td>\n",
       "      <td>2:02:57</td>\n",
       "      <td>4:41.4</td>\n",
       "      <td>Berlin, 2014</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Evans Chebet (Kenya)</td>\n",
       "      <td>2:03:00</td>\n",
       "      <td>4:41.5</td>\n",
       "      <td>Valencia, 2020</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Gabriel Geay (Tanzania)</td>\n",
       "      <td>2:03:00</td>\n",
       "      <td>4:41.5</td>\n",
       "      <td>Valencia, 2022</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Lawrence Cherono (Kenya)</td>\n",
       "      <td>2:03:04</td>\n",
       "      <td>4:41.6</td>\n",
       "      <td>Valencia, 2022</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                             0            1          2               3\n",
       "0                       Runner  Finish Time  Pace/Mile        Marathon\n",
       "1        Kelvin Kiptum (Kenya)      2:00:35     4:36.0   Chicago, 2023\n",
       "2       Eliud Kipchoge (Kenya)      2:01:09     4:37.2    Berlin, 2022\n",
       "3   Kenenisa Bekele (Ethiopia)      2:01:41     4:38.5    Berlin, 2019\n",
       "4       Sisay Lemma (Ethiopia)      2:01:48     4:38.7  Valencia, 2023\n",
       "5    Birhanu Legese (Ethiopia)      2:02:48     4:41.0    Berlin, 2019\n",
       "6   Mosinet Geremew (Ethiopia)      2:02:55     4:41.3    London, 2019\n",
       "7       Dennis Kimetto (Kenya)      2:02:57     4:41.4    Berlin, 2014\n",
       "8         Evans Chebet (Kenya)      2:03:00     4:41.5  Valencia, 2020\n",
       "9      Gabriel Geay (Tanzania)      2:03:00     4:41.5  Valencia, 2022\n",
       "10    Lawrence Cherono (Kenya)      2:03:04     4:41.6  Valencia, 2022"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tabs[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "____"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}