lepangdan/gist:26d0e622f0e0b9b8b822428520a340f3

## gistfile1.txt
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "% matplotlib inline\n",
    "\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "sns.set(style='darkgrid')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv('/home/pangdan/Desktop/fortune500.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Show some data in DataFrame structure"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Year</th>\n",
       "      <th>Rank</th>\n",
       "      <th>Company</th>\n",
       "      <th>Revenue (in millions)</th>\n",
       "      <th>Profit (in millions)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1955</td>\n",
       "      <td>1</td>\n",
       "      <td>General Motors</td>\n",
       "      <td>9823.5</td>\n",
       "      <td>806</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1955</td>\n",
       "      <td>2</td>\n",
       "      <td>Exxon Mobil</td>\n",
       "      <td>5661.4</td>\n",
       "      <td>584.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1955</td>\n",
       "      <td>3</td>\n",
       "      <td>U.S. Steel</td>\n",
       "      <td>3250.4</td>\n",
       "      <td>195.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1955</td>\n",
       "      <td>4</td>\n",
       "      <td>General Electric</td>\n",
       "      <td>2959.1</td>\n",
       "      <td>212.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1955</td>\n",
       "      <td>5</td>\n",
       "      <td>Esmark</td>\n",
       "      <td>2510.8</td>\n",
       "      <td>19.1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Year  Rank           Company  Revenue (in millions) Profit (in millions)\n",
       "0  1955     1    General Motors                 9823.5                  806\n",
       "1  1955     2       Exxon Mobil                 5661.4                584.8\n",
       "2  1955     3        U.S. Steel                 3250.4                195.4\n",
       "3  1955     4  General Electric                 2959.1                212.6\n",
       "4  1955     5            Esmark                 2510.8                 19.1"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Year</th>\n",
       "      <th>Rank</th>\n",
       "      <th>Company</th>\n",
       "      <th>Revenue (in millions)</th>\n",
       "      <th>Profit (in millions)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>25495</th>\n",
       "      <td>2005</td>\n",
       "      <td>496</td>\n",
       "      <td>Wm. Wrigley Jr.</td>\n",
       "      <td>3648.6</td>\n",
       "      <td>493</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25496</th>\n",
       "      <td>2005</td>\n",
       "      <td>497</td>\n",
       "      <td>Peabody Energy</td>\n",
       "      <td>3631.6</td>\n",
       "      <td>175.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25497</th>\n",
       "      <td>2005</td>\n",
       "      <td>498</td>\n",
       "      <td>Wendy's International</td>\n",
       "      <td>3630.4</td>\n",
       "      <td>57.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25498</th>\n",
       "      <td>2005</td>\n",
       "      <td>499</td>\n",
       "      <td>Kindred Healthcare</td>\n",
       "      <td>3616.6</td>\n",
       "      <td>70.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25499</th>\n",
       "      <td>2005</td>\n",
       "      <td>500</td>\n",
       "      <td>Cincinnati Financial</td>\n",
       "      <td>3614.0</td>\n",
       "      <td>584</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Year  Rank                Company  Revenue (in millions)  \\\n",
       "25495  2005   496        Wm. Wrigley Jr.                 3648.6   \n",
       "25496  2005   497         Peabody Energy                 3631.6   \n",
       "25497  2005   498  Wendy's International                 3630.4   \n",
       "25498  2005   499     Kindred Healthcare                 3616.6   \n",
       "25499  2005   500   Cincinnati Financial                 3614.0   \n",
       "\n",
       "      Profit (in millions)  \n",
       "25495                  493  \n",
       "25496                175.4  \n",
       "25497                 57.8  \n",
       "25498                 70.6  \n",
       "25499                  584  "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Rename those colums so we can refer to them later."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.columns = ['year', 'rank', 'company', 'revenue', 'profit']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Exlore our data set, Is it complete? Did pandas read it as expected? Are any values missing?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "25500"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check whether our data set has been imported as we would expect. A simple check is to see if the data types(or dtypes) have been correctly interpreted."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "year         int64\n",
       "rank         int64\n",
       "company     object\n",
       "revenue    float64\n",
       "profit      object\n",
       "dtype: object"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Uh,oh. It looks like there's something wrong with the profits column -- we would expect it to be a float64 like the revenue column. This indicates that it probably contains some noninteger values, so let's take a look."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>rank</th>\n",
       "      <th>company</th>\n",
       "      <th>revenue</th>\n",
       "      <th>profit</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>228</th>\n",
       "      <td>1955</td>\n",
       "      <td>229</td>\n",
       "      <td>Norton</td>\n",
       "      <td>135.0</td>\n",
       "      <td>N.A.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>290</th>\n",
       "      <td>1955</td>\n",
       "      <td>291</td>\n",
       "      <td>Schlitz Brewing</td>\n",
       "      <td>100.0</td>\n",
       "      <td>N.A.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>294</th>\n",
       "      <td>1955</td>\n",
       "      <td>295</td>\n",
       "      <td>Pacific Vegetable Oil</td>\n",
       "      <td>97.9</td>\n",
       "      <td>N.A.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>296</th>\n",
       "      <td>1955</td>\n",
       "      <td>297</td>\n",
       "      <td>Liebmann Breweries</td>\n",
       "      <td>96.0</td>\n",
       "      <td>N.A.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>352</th>\n",
       "      <td>1955</td>\n",
       "      <td>353</td>\n",
       "      <td>Minneapolis-Moline</td>\n",
       "      <td>77.4</td>\n",
       "      <td>N.A.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     year  rank                company  revenue profit\n",
       "228  1955   229                 Norton    135.0   N.A.\n",
       "290  1955   291        Schlitz Brewing    100.0   N.A.\n",
       "294  1955   295  Pacific Vegetable Oil     97.9   N.A.\n",
       "296  1955   297     Liebmann Breweries     96.0   N.A.\n",
       "352  1955   353     Minneapolis-Moline     77.4   N.A."
      ]
     },
     "execution_count": 112,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "non_numberic_profits = df.profit.str.contains('[^0-9.-]')\n",
    "df.loc[non_numberic_profits].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That makes it easy to interpret, but what should we do? Well, that dependes how many values ar missing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 115,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "369"
      ]
     },
     "execution_count": 115,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df.profit[non_numberic_profits])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD7CAYAAABzGc+QAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAADzBJREFUeJzt3X+M5PVdx/Hn9RZNgW1cwh6cSHI1oe8UkUJBSoJRhBTRggcxIJTQUwiggaQ1F82FmNCIxvuDQ0lsWrFc70gRIQLhh9iTHGlJrRKFUDiFd2zIhZ5cuEVWOb2Y9mD9Y75blmNnd2Z2dmbfO89HcrnZ78x85/Oe7+xrPvv5fr7f75qZmRkkSfV8aNgNkCT1xgCXpKIMcEkqygCXpKIMcEkqygCXpKLGFntARJwM3AucCLwL3J2Zd0XEF4EbgKnmobdm5pPL1VBJ0vutWWweeESsB9Zn5vMRMQ48B1wGXAn8T2besfzNlCQdadEeeGbuB/Y3tw9GxMvASb282NTUwZJHDU1MHM309KFhN2NgRq1esOZRUbXmycnxNfMt72oMPCI2AGcCzzaLbomIFyNie0RMLK2JK9fY2NphN2GgRq1esOZRsdpqXnQIZVZEHAt8C/jjzHw4Ik4A3gRmgNtpDbNct9A6Dh9+Z2a1vYGSNADz9sA7CvCIOAp4AtiVmXfOc/8G4InMPG2h9VQdQpmcHGdq6uCwmzEwo1YvWPOoqFpzz0MoEbEGuAd4eW54Nzs3Z10O7FlqIyVJnVt0JyZwHnAt8FJEvNAsuxW4OiLOoDWEshe4aVlaKEmaVyezUL7N/OMvzvmWpCHySExJKsoAl6SiDHBJKsoAl6SiOpmFIqmw67Y+Pe/yx7dtHHBL1G/2wCWpKANckooywCWpKANckooywCWpKANckooywCWpKOeBSytUu/nb7WzfcsEytUQrlT1wSSrKAJekogxwSSrKAJekogxwSSrKAJekogxwSSrKAJekogxwSSrKAJekogxwSSrKAJekogxwSSrKAJekogxwSSrKAJekogxwSSrKK/JI6ki7KwR5JaDhsQcuSUUZ4JJUlAEuSUUtOgYeEScD9wInAu8Cd2fmXRFxHPAAsAHYC1yZmdPL11RJ0lyd9MAPA5sz8+PAucDNEXEqsAXYnZmnALubnyVJA7JogGfm/sx8vrl9EHgZOAnYCOxsHrYTuGy5GilJ+qCuxsAjYgNwJvAscEJm7odWyAPr+t46SVJbHc8Dj4hjgYeAL2Tm2xHR9YtNTBzN2Njarp+3EkxOjg+7CQM1avVC/ZrbzdNeyHw1X7r50SWvYyWr1t6FdBTgEXEUrfC+LzMfbha/ERHrM3N/RKwHDiy2nunpQ723dIgmJ8eZmjo47GYMzKjVC6NZM9CXmiu9b1W3c7svnUWHUCJiDXAP8HJm3jnnrseATc3tTUB3X9uSpCXppAd+HnAt8FJEvNAsuxXYCjwYEdcDrwFXLE8TJUnzWTTAM/PbwJo2d1/Y3+ZIkjrlkZiSVJQBLklFGeCSVJQBLklFeUEHaUR1e8COVh574JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJU1NiwGyCptuu2Pt31c7ZvuWAZWjJ67IFLUlEGuCQVZYBLUlGLjoFHxHbgEuBAZp7WLPsicAMw1Tzs1sx8crkaKUn6oE52Yu4A/hy494jlf5qZd/S9RZKkjiw6hJKZzwBvDaAtkqQuLGUM/JaIeDEitkfERN9aJEnqSK/zwL8M3A7MNP9vA65b7EkTE0czNra2x5ccrsnJ8WE3YaBGrV4YzZqHZZjv9Wrazj0FeGa+MXs7Iv4SeKKT501PH+rl5YZucnKcqamDw27GwIxavTCaNQ/TsN7rqtu53ZdOT0MoEbF+zo+XA3t6WY8kqXedTCO8HzgfOD4i9gG3AedHxBm0hlD2AjctYxslSfNYNMAz8+p5Ft+zDG2RJHXBIzElqSgDXJKKMsAlqSgDXJKKMsAlqSgDXJKKMsAlqSgDXJKKMsAlqSgDXJKKMsAlqSgDXJKKMsAlqaher8gjqU+u2/r0sJugouyBS1JRBrgkFWWAS1JRBrgkFWWAS1JRBrgkFWWAS1JRBrgkFWWAS1JRBrgkFWWAS1JRBrgkFWWAS1JRBrgkFWWAS1JRBrgkFeUFHaQB8cIN6jd74JJUlAEuSUUZ4JJUlAEuSUUtuhMzIrYDlwAHMvO0ZtlxwAPABmAvcGVmTi9fMyVJR+qkB74DuPiIZVuA3Zl5CrC7+VmSNECLBnhmPgO8dcTijcDO5vZO4LI+t0uStIhex8BPyMz9AM3/6/rXJElSJwZ6IM/ExNGMja0d5Ev2zeTk+LCbMFCjVi90X/Olmx+dd/nj2zb2ozmr2jA/X6vps91rgL8REeszc39ErAcOdPKk6elDPb7ccE1OjjM1dXDYzRiYUasX+lvzqL13vRjWe1T1s93uS6fXIZTHgE3N7U3A/F0RSdKy6WQa4f3A+cDxEbEPuA3YCjwYEdcDrwFXLGcjJUkftGiAZ+bVbe66sM9tkSR1wSMxJakoA1ySijLAJakoL+igVanbiyf0c+62F27QoNgDl6SiDHBJKsoAl6SiDHBJKsoAl6SiDHBJKsoAl6SiDHBJKsoDedQX7Q5e2b7lggG3pL88KEcrmT1wSSrKAJekogxwSSrKAJekogxwSSrKAJekogxwSSrKeeDqSrfzort9fLt548s9H/vSzY8u6/ql5WAPXJKKMsAlqSgDXJKKMsAlqSgDXJKKMsAlqSgDXJKKMsAlqagyB/Ks1gsG6P28gMJo6NcBXqPOHrgkFWWAS1JRBrgkFWWAS1JRS9qJGRF7gYPAO8DhzDy7D22SJHWgH7NQfikz3+zDeiRJXXAIRZKKWmqAzwB/HxHPRcSN/WiQJKkzSx1COS8zX4+IdcBTEfFKZj7T7sETE0czNrZ2iS/5fpOT431d37BfZ6UYtXq1svXz89iPdbW7gtPj2zYued3dWFKAZ+brzf8HIuIR4BygbYBPTx9aysvNa2rqYN/XeaTJyfGBvM5KMWr1auXr1+dxuT/by7Xudl86PQ+hRMQxETE+exu4CNjT6/okSd1ZSg/8BOCRiJhdz19l5jf60ipJ0qJ6DvDMfBX4RB/bIknqgtMIJakoA1ySijLAJamoMhd0kDS6+nVBl3bzt9utZ6VfYMQeuCQVZYBLUlEGuCQVZYBLUlEGuCQVZYBLUlEGuCQVZYBLUlGr9kCefk7AH/RJ2iV1pl+/5yv9gJ127IFLUlEGuCQVZYBLUlEGuCQVZYBLUlEGuCQVZYBLUlEGuCQVVf5AnmFOwO/2tbu9eogkLcQeuCQVZYBLUlEGuCQVZYBLUlEGuCQVZYBLUlEGuCQVVX4e+CBcuvnRobxuu3nmzieXVqaFjg1Zjt9be+CSVJQBLklFGeCSVJQBLklFLWknZkRcDNwFrAW+mplb+9IqSdKieu6BR8Ra4EvArwCnAldHxKn9apgkaWFLGUI5B/heZr6amT8A/hrY2J9mSZIWs5QAPwn4/pyf9zXLJEkDsJQx8DXzLJtZ6AmTk+PzPacjj28bvc79MGuenByfd/kobgdppVpKD3wfcPKcn38KeH1pzZEkdWopPfB/Bk6JiI8C/wFcBXy2L62SJC2q5x54Zh4GbgF2AS8DD2bmv/arYZKkha2ZmVlw2FqStEJ5JKYkFWWAS1JRI3k+8IjYDlwCHMjM05plnwC+AhwL7AWuycy3m/tOB/4C+AjwLvBzmfl/EXEWsAP4MPAk8PnMXJFjUt3UHBHXAL835+mnA5/MzBdWcc1HAV8FPknr9+LezPyT5jllThnRZc0/RutzfTatz/XnM/ObzXNKbOeIOBm4FziRVg13Z+ZdEXEc8ACwgVbNV2bmdESsobUtfxU4BPxmZj7frGsT8AfNqv8oM3cOspZejGoPfAdw8RHLvgpsycyfBR6hCbCIGAO+Dvx2Zv4McD7ww+Y5XwZuBE5p/h25zpVkBx3WnJn3ZeYZmXkGcC2wNzNfaJ6zKmsGrgB+vFl+FnBTRGwoeMqIHXRe8w0AzfJPA9siYjYTqmznw8DmzPw4cC5wc7N9tgC7M/MUYHfzM7S242xNN9KqkybwbwM+Reso89siYmKQhfRiJAM8M58B3jpicQDPNLefAn69uX0R8GJmfrd57n9m5jsRsR74SGb+Y9MzuRe4bPlb35sua57rauB+gFVe8wxwTPOF/WHgB8DbFDtlRJc1n0or3MjMA8B/AWdX2s6ZuX+2B52ZB2nNiDuJ1jaa7UHv5L32b6T119VMZv4T8BNNvb8MPJWZb2XmNK33aaV+af3ISAZ4G3uAX2tuX8F7Byl9DJiJiF0R8XxE/H6z/CRaBzPNqngqgXY1z/UbNAHO6q75b4D/BfYDrwF3ZOZbrI5TRrSr+bvAxogYa47nOKu5r+R2jogNwJnAs8AJmbkfWiEPrGse1m57ltzOBvh7rqP159dzwDitHhi0xkN/Hrim+f/yiLiQHk4lsAK1qxmAiPgUcCgz9zSLVnPN5wDvAD8JfBTYHBE/zequeTutoPoX4M+A79AakihXc0QcCzwEfGF231Ub7WorVzOM6E7M+WTmK7SGS4iIjwGfae7aB3wrM99s7nuS1o6ur9M6fcCscqcSWKDmWVfxXu8bWu/Faq35s8A3MvOHwIGI+AdaO/e+T/FTRrSruTkY73dnHxcR3wH+HZim0HZudkA/BNyXmQ83i9+IiPWZub8ZIjnQLG93CpB9tPZvzV3+zeVsdz/YA29ExLrm/w/R2hP9leauXcDpEXF0Mz76i8C/NX+WHYyIc5s9258DhnP5+h4tUPPssitojfkCP/pTdLXW/BpwQUSsiYhjaO0Qe4U5p4xoZm1cBTw2+Jb3rl3NzWf6mOb2p4HDmVnqs9207x7g5cy8c85djwGbmtubeK/9jwGfa7bzucB/N/XuAi6KiIlm5+VFzbIVbSR74BFxP61v2+MjYh+tvc/HRsTNzUMeBr4G0Ew9upPWL/IM8GRm/m3zuN/hvalWf9f8W5G6qbnxC8C+zHz1iFWt1pq/1NzeQ+vP6a9l5ovNemZPGbEW2L6STxnRZc3rgF0R8S6t8xldO2dVVbbzebTa/VJEzM6UuhXYCjwYEdfT+nK+ornvSVpTCL9HaxrhbwFk5lsRcTut33OAP2z2gaxoHkovSUU5hCJJRRngklSUAS5JRRngklSUAS5JRRngklSUAS5JRRngklTU/wO4VGl5i1pengAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "bin_sizes, ,  = plt.hist(df.year[non_numberic_profits], bins=range(1955, 2006))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = df.loc[~non_numberic_profits]\n",
    "df.profit = df.profit.apply(pd.to_numeric)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We should check that worked."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "25131"
      ]
     },
     "execution_count": 118,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 120,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "year         int64\n",
       "rank         int64\n",
       "company     object\n",
       "revenue    float64\n",
       "profit     float64\n",
       "dtype: object"
      ]
     },
     "execution_count": 120,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Great! We have finished our data set setup.\n",
    "\n",
    "If you were going to present your notebook as a report, you could get rid of the investigatory cells we created, which are included here as a demonstration of the flow of working with notebooks, and merge relevant cells (see the Advanced Functionality section below for more on this) to create a single data set setup cell. This would mean that if we ever mess up our data set elsewhere, we can just rerun the setup cell to restore it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Sharing your notebooks\n",
    "When people talk of sharing their notebooks, there are generally two paradigms they may be considering. Most often, individuals share the end-result of their work, much like this article itself, which means sharing non-interactive ,pre-rendered versions of  their  notebooks; however, it is also possible collaborated on notebooks with the aid version control systems such as GIt. \n",
    "\n",
    "That said, there are some nascnet companies popping up on the web offering the ability to run interactive jupyter notebooks in the cloud. \n",
    "\n",
    "\n",
    "# Before you share\n",
    "A shared notebook will appear exactly in the state it was in when you export or save it, including the output of any code of any code cells. Therefore, to ensure that your notebook is share-ready, so to speak, there are a few steps you should take before sharing:\n",
    "\n",
    "\n",
    "\n",
    "1. Click \"cell \\> All Outpue \\> clear\"\n",
    "\n",
    "\n",
    "2. Click \"Kernel \\> Restart & Run All\"\n",
    "\n",
    "\n",
    "3. Wait for your code cells to finish executing and check they did so as expected\n",
    "\n",
    "\n",
    "\n",
    "Thisi will ensure your notebooks don't contain intermediary output, have a statle state, and executed in order at the time of sharing"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
	{
	"cells": [
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {},
	"outputs": [],
	"source": [
	"% matplotlib inline\n",
	"\n",
	"import pandas as pd\n",
	"import matplotlib.pyplot as plt\n",
	"import seaborn as sns\n",
	"\n",
	"sns.set(style='darkgrid')"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"metadata": {},
	"outputs": [],
	"source": [
	"df = pd.read_csv('/home/pangdan/Desktop/fortune500.csv')"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Show some data in DataFrame structure"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>Year</th>\n",
	" <th>Rank</th>\n",
	" <th>Company</th>\n",
	" <th>Revenue (in millions)</th>\n",
	" <th>Profit (in millions)</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>0</th>\n",
	" <td>1955</td>\n",
	" <td>1</td>\n",
	" <td>General Motors</td>\n",
	" <td>9823.5</td>\n",
	" <td>806</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>1</th>\n",
	" <td>1955</td>\n",
	" <td>2</td>\n",
	" <td>Exxon Mobil</td>\n",
	" <td>5661.4</td>\n",
	" <td>584.8</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>2</th>\n",
	" <td>1955</td>\n",
	" <td>3</td>\n",
	" <td>U.S. Steel</td>\n",
	" <td>3250.4</td>\n",
	" <td>195.4</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>3</th>\n",
	" <td>1955</td>\n",
	" <td>4</td>\n",
	" <td>General Electric</td>\n",
	" <td>2959.1</td>\n",
	" <td>212.6</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>4</th>\n",
	" <td>1955</td>\n",
	" <td>5</td>\n",
	" <td>Esmark</td>\n",
	" <td>2510.8</td>\n",
	" <td>19.1</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" Year Rank Company Revenue (in millions) Profit (in millions)\n",
	"0 1955 1 General Motors 9823.5 806\n",
	"1 1955 2 Exxon Mobil 5661.4 584.8\n",
	"2 1955 3 U.S. Steel 3250.4 195.4\n",
	"3 1955 4 General Electric 2959.1 212.6\n",
	"4 1955 5 Esmark 2510.8 19.1"
	]
	},
	"execution_count": 7,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"df.head()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 8,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>Year</th>\n",
	" <th>Rank</th>\n",
	" <th>Company</th>\n",
	" <th>Revenue (in millions)</th>\n",
	" <th>Profit (in millions)</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>25495</th>\n",
	" <td>2005</td>\n",
	" <td>496</td>\n",
	" <td>Wm. Wrigley Jr.</td>\n",
	" <td>3648.6</td>\n",
	" <td>493</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>25496</th>\n",
	" <td>2005</td>\n",
	" <td>497</td>\n",
	" <td>Peabody Energy</td>\n",
	" <td>3631.6</td>\n",
	" <td>175.4</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>25497</th>\n",
	" <td>2005</td>\n",
	" <td>498</td>\n",
	" <td>Wendy's International</td>\n",
	" <td>3630.4</td>\n",
	" <td>57.8</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>25498</th>\n",
	" <td>2005</td>\n",
	" <td>499</td>\n",
	" <td>Kindred Healthcare</td>\n",
	" <td>3616.6</td>\n",
	" <td>70.6</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>25499</th>\n",
	" <td>2005</td>\n",
	" <td>500</td>\n",
	" <td>Cincinnati Financial</td>\n",
	" <td>3614.0</td>\n",
	" <td>584</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" Year Rank Company Revenue (in millions) \\\n",
	"25495 2005 496 Wm. Wrigley Jr. 3648.6 \n",
	"25496 2005 497 Peabody Energy 3631.6 \n",
	"25497 2005 498 Wendy's International 3630.4 \n",
	"25498 2005 499 Kindred Healthcare 3616.6 \n",
	"25499 2005 500 Cincinnati Financial 3614.0 \n",
	"\n",
	" Profit (in millions) \n",
	"25495 493 \n",
	"25496 175.4 \n",
	"25497 57.8 \n",
	"25498 70.6 \n",
	"25499 584 "
	]
	},
	"execution_count": 8,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"df.tail()"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Rename those colums so we can refer to them later."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 11,
	"metadata": {},
	"outputs": [],
	"source": [
	"df.columns = ['year', 'rank', 'company', 'revenue', 'profit']"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Exlore our data set, Is it complete? Did pandas read it as expected? Are any values missing?"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 12,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"25500"
	]
	},
	"execution_count": 12,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"len(df)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Check whether our data set has been imported as we would expect. A simple check is to see if the data types(or dtypes) have been correctly interpreted."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 13,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"year int64\n",
	"rank int64\n",
	"company object\n",
	"revenue float64\n",
	"profit object\n",
	"dtype: object"
	]
	},
	"execution_count": 13,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"df.dtypes"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Uh,oh. It looks like there's something wrong with the profits column -- we would expect it to be a float64 like the revenue column. This indicates that it probably contains some noninteger values, so let's take a look."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 112,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>year</th>\n",
	" <th>rank</th>\n",
	" <th>company</th>\n",
	" <th>revenue</th>\n",
	" <th>profit</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>228</th>\n",
	" <td>1955</td>\n",
	" <td>229</td>\n",
	" <td>Norton</td>\n",
	" <td>135.0</td>\n",
	" <td>N.A.</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>290</th>\n",
	" <td>1955</td>\n",
	" <td>291</td>\n",
	" <td>Schlitz Brewing</td>\n",
	" <td>100.0</td>\n",
	" <td>N.A.</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>294</th>\n",
	" <td>1955</td>\n",
	" <td>295</td>\n",
	" <td>Pacific Vegetable Oil</td>\n",
	" <td>97.9</td>\n",
	" <td>N.A.</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>296</th>\n",
	" <td>1955</td>\n",
	" <td>297</td>\n",
	" <td>Liebmann Breweries</td>\n",
	" <td>96.0</td>\n",
	" <td>N.A.</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>352</th>\n",
	" <td>1955</td>\n",
	" <td>353</td>\n",
	" <td>Minneapolis-Moline</td>\n",
	" <td>77.4</td>\n",
	" <td>N.A.</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" year rank company revenue profit\n",
	"228 1955 229 Norton 135.0 N.A.\n",
	"290 1955 291 Schlitz Brewing 100.0 N.A.\n",
	"294 1955 295 Pacific Vegetable Oil 97.9 N.A.\n",
	"296 1955 297 Liebmann Breweries 96.0 N.A.\n",
	"352 1955 353 Minneapolis-Moline 77.4 N.A."
	]
	},
	"execution_count": 112,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"non_numberic_profits = df.profit.str.contains('[^0-9.-]')\n",
	"df.loc[non_numberic_profits].head()"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"That makes it easy to interpret, but what should we do? Well, that dependes how many values ar missing."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 115,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"369"
	]
	},
	"execution_count": 115,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"len(df.profit[non_numberic_profits])"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 116,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD7CAYAAABzGc+QAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAADzBJREFUeJzt3X+M5PVdx/Hn9RZNgW1cwh6cSHI1oe8UkUJBSoJRhBTRggcxIJTQUwiggaQ1F82FmNCIxvuDQ0lsWrFc70gRIQLhh9iTHGlJrRKFUDiFd2zIhZ5cuEVWOb2Y9mD9Y75blmNnd2Z2dmbfO89HcrnZ78x85/Oe7+xrPvv5fr7f75qZmRkkSfV8aNgNkCT1xgCXpKIMcEkqygCXpKIMcEkqygCXpKLGFntARJwM3AucCLwL3J2Zd0XEF4EbgKnmobdm5pPL1VBJ0vutWWweeESsB9Zn5vMRMQ48B1wGXAn8T2besfzNlCQdadEeeGbuB/Y3tw9GxMvASb282NTUwZJHDU1MHM309KFhN2NgRq1esOZRUbXmycnxNfMt72oMPCI2AGcCzzaLbomIFyNie0RMLK2JK9fY2NphN2GgRq1esOZRsdpqXnQIZVZEHAt8C/jjzHw4Ik4A3gRmgNtpDbNct9A6Dh9+Z2a1vYGSNADz9sA7CvCIOAp4AtiVmXfOc/8G4InMPG2h9VQdQpmcHGdq6uCwmzEwo1YvWPOoqFpzz0MoEbEGuAd4eW54Nzs3Z10O7FlqIyVJnVt0JyZwHnAt8FJEvNAsuxW4OiLOoDWEshe4aVlaKEmaVyezUL7N/OMvzvmWpCHySExJKsoAl6SiDHBJKsoAl6SiOpmFIqmw67Y+Pe/yx7dtHHBL1G/2wCWpKANckooywCWpKANckooywCWpKANckooywCWpKOeBSytUu/nb7WzfcsEytUQrlT1wSSrKAJekogxwSSrKAJekogxwSSrKAJekogxwSSrKAJekogxwSSrKAJekogxwSSrKAJekogxwSSrKAJekogxwSSrKAJekogxwSSrKK/JI6ki7KwR5JaDhsQcuSUUZ4JJUlAEuSUUtOgYeEScD9wInAu8Cd2fmXRFxHPAAsAHYC1yZmdPL11RJ0lyd9MAPA5sz8+PAucDNEXEqsAXYnZmnALubnyVJA7JogGfm/sx8vrl9EHgZOAnYCOxsHrYTuGy5GilJ+qCuxsAjYgNwJvAscEJm7odWyAPr+t46SVJbHc8Dj4hjgYeAL2Tm2xHR9YtNTBzN2Njarp+3EkxOjg+7CQM1avVC/ZrbzdNeyHw1X7r50SWvYyWr1t6FdBTgEXEUrfC+LzMfbha/ERHrM3N/RKwHDiy2nunpQ723dIgmJ8eZmjo47GYMzKjVC6NZM9CXmiu9b1W3c7svnUWHUCJiDXAP8HJm3jnnrseATc3tTUB3X9uSpCXppAd+HnAt8FJEvNAsuxXYCjwYEdcDrwFXLE8TJUnzWTTAM/PbwJo2d1/Y3+ZIkjrlkZiSVJQBLklFGeCSVJQBLklFeUEHaUR1e8COVh574JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJU1NiwGyCptuu2Pt31c7ZvuWAZWjJ67IFLUlEGuCQVZYBLUlGLjoFHxHbgEuBAZp7WLPsicAMw1Tzs1sx8crkaKUn6oE52Yu4A/hy494jlf5qZd/S9RZKkjiw6hJKZzwBvDaAtkqQuLGUM/JaIeDEitkfERN9aJEnqSK/zwL8M3A7MNP9vA65b7EkTE0czNra2x5ccrsnJ8WE3YaBGrV4YzZqHZZjv9Wrazj0FeGa+MXs7Iv4SeKKT501PH+rl5YZucnKcqamDw27GwIxavTCaNQ/TsN7rqtu53ZdOT0MoEbF+zo+XA3t6WY8kqXedTCO8HzgfOD4i9gG3AedHxBm0hlD2AjctYxslSfNYNMAz8+p5Ft+zDG2RJHXBIzElqSgDXJKKMsAlqSgDXJKKMsAlqSgDXJKKMsAlqSgDXJKKMsAlqSgDXJKKMsAlqSgDXJKKMsAlqaher8gjqU+u2/r0sJugouyBS1JRBrgkFWWAS1JRBrgkFWWAS1JRBrgkFWWAS1JRBrgkFWWAS1JRBrgkFWWAS1JRBrgkFWWAS1JRBrgkFWWAS1JRBrgkFeUFHaQB8cIN6jd74JJUlAEuSUUZ4JJUlAEuSUUtuhMzIrYDlwAHMvO0ZtlxwAPABmAvcGVmTi9fMyVJR+qkB74DuPiIZVuA3Zl5CrC7+VmSNECLBnhmPgO8dcTijcDO5vZO4LI+t0uStIhex8BPyMz9AM3/6/rXJElSJwZ6IM/ExNGMja0d5Ev2zeTk+LCbMFCjVi90X/Olmx+dd/nj2zb2ozmr2jA/X6vps91rgL8REeszc39ErAcOdPKk6elDPb7ccE1OjjM1dXDYzRiYUasX+lvzqL13vRjWe1T1s93uS6fXIZTHgE3N7U3A/F0RSdKy6WQa4f3A+cDxEbEPuA3YCjwYEdcDrwFXLGcjJUkftGiAZ+bVbe66sM9tkSR1wSMxJakoA1ySijLAJakoL+igVanbiyf0c+62F27QoNgDl6SiDHBJKsoAl6SiDHBJKsoAl6SiDHBJKsoAl6SiDHBJKsoDedQX7Q5e2b7lggG3pL88KEcrmT1wSSrKAJekogxwSSrKAJekogxwSSrKAJekogxwSSrKeeDqSrfzort9fLt548s9H/vSzY8u6/ql5WAPXJKKMsAlqSgDXJKKMsAlqSgDXJKKMsAlqSgDXJKKMsAlqagyB/Ks1gsG6P28gMJo6NcBXqPOHrgkFWWAS1JRBrgkFWWAS1JRS9qJGRF7gYPAO8DhzDy7D22SJHWgH7NQfikz3+zDeiRJXXAIRZKKWmqAzwB/HxHPRcSN/WiQJKkzSx1COS8zX4+IdcBTEfFKZj7T7sETE0czNrZ2iS/5fpOT431d37BfZ6UYtXq1svXz89iPdbW7gtPj2zYued3dWFKAZ+brzf8HIuIR4BygbYBPTx9aysvNa2rqYN/XeaTJyfGBvM5KMWr1auXr1+dxuT/by7Xudl86PQ+hRMQxETE+exu4CNjT6/okSd1ZSg/8BOCRiJhdz19l5jf60ipJ0qJ6DvDMfBX4RB/bIknqgtMIJakoA1ySijLAJamoMhd0kDS6+nVBl3bzt9utZ6VfYMQeuCQVZYBLUlEGuCQVZYBLUlEGuCQVZYBLUlEGuCQVZYBLUlGr9kCefk7AH/RJ2iV1pl+/5yv9gJ127IFLUlEGuCQVZYBLUlEGuCQVZYBLUlEGuCQVZYBLUlEGuCQVVf5AnmFOwO/2tbu9eogkLcQeuCQVZYBLUlEGuCQVZYBLUlEGuCQVZYBLUlEGuCQVVX4e+CBcuvnRobxuu3nmzieXVqaFjg1Zjt9be+CSVJQBLklFGeCSVJQBLklFLWknZkRcDNwFrAW+mplb+9IqSdKieu6BR8Ra4EvArwCnAldHxKn9apgkaWFLGUI5B/heZr6amT8A/hrY2J9mSZIWs5QAPwn4/pyf9zXLJEkDsJQx8DXzLJtZ6AmTk+PzPacjj28bvc79MGuenByfd/kobgdppVpKD3wfcPKcn38KeH1pzZEkdWopPfB/Bk6JiI8C/wFcBXy2L62SJC2q5x54Zh4GbgF2AS8DD2bmv/arYZKkha2ZmVlw2FqStEJ5JKYkFWWAS1JRI3k+8IjYDlwCHMjM05plnwC+AhwL7AWuycy3m/tOB/4C+AjwLvBzmfl/EXEWsAP4MPAk8PnMXJFjUt3UHBHXAL835+mnA5/MzBdWcc1HAV8FPknr9+LezPyT5jllThnRZc0/RutzfTatz/XnM/ObzXNKbOeIOBm4FziRVg13Z+ZdEXEc8ACwgVbNV2bmdESsobUtfxU4BPxmZj7frGsT8AfNqv8oM3cOspZejGoPfAdw8RHLvgpsycyfBR6hCbCIGAO+Dvx2Zv4McD7ww+Y5XwZuBE5p/h25zpVkBx3WnJn3ZeYZmXkGcC2wNzNfaJ6zKmsGrgB+vFl+FnBTRGwoeMqIHXRe8w0AzfJPA9siYjYTqmznw8DmzPw4cC5wc7N9tgC7M/MUYHfzM7S242xNN9KqkybwbwM+Reso89siYmKQhfRiJAM8M58B3jpicQDPNLefAn69uX0R8GJmfrd57n9m5jsRsR74SGb+Y9MzuRe4bPlb35sua57rauB+gFVe8wxwTPOF/WHgB8DbFDtlRJc1n0or3MjMA8B/AWdX2s6ZuX+2B52ZB2nNiDuJ1jaa7UHv5L32b6T119VMZv4T8BNNvb8MPJWZb2XmNK33aaV+af3ISAZ4G3uAX2tuX8F7Byl9DJiJiF0R8XxE/H6z/CRaBzPNqngqgXY1z/UbNAHO6q75b4D/BfYDrwF3ZOZbrI5TRrSr+bvAxogYa47nOKu5r+R2jogNwJnAs8AJmbkfWiEPrGse1m57ltzOBvh7rqP159dzwDitHhi0xkN/Hrim+f/yiLiQHk4lsAK1qxmAiPgUcCgz9zSLVnPN5wDvAD8JfBTYHBE/zequeTutoPoX4M+A79AakihXc0QcCzwEfGF231Ub7WorVzOM6E7M+WTmK7SGS4iIjwGfae7aB3wrM99s7nuS1o6ur9M6fcCscqcSWKDmWVfxXu8bWu/Faq35s8A3MvOHwIGI+AdaO/e+T/FTRrSruTkY73dnHxcR3wH+HZim0HZudkA/BNyXmQ83i9+IiPWZub8ZIjnQLG93CpB9tPZvzV3+zeVsdz/YA29ExLrm/w/R2hP9leauXcDpEXF0Mz76i8C/NX+WHYyIc5s9258DhnP5+h4tUPPssitojfkCP/pTdLXW/BpwQUSsiYhjaO0Qe4U5p4xoZm1cBTw2+Jb3rl3NzWf6mOb2p4HDmVnqs9207x7g5cy8c85djwGbmtubeK/9jwGfa7bzucB/N/XuAi6KiIlm5+VFzbIVbSR74BFxP61v2+MjYh+tvc/HRsTNzUMeBr4G0Ew9upPWL/IM8GRm/m3zuN/hvalWf9f8W5G6qbnxC8C+zHz1iFWt1pq/1NzeQ+vP6a9l5ovNemZPGbEW2L6STxnRZc3rgF0R8S6t8xldO2dVVbbzebTa/VJEzM6UuhXYCjwYEdfT+nK+ornvSVpTCL9HaxrhbwFk5lsRcTut33OAP2z2gaxoHkovSUU5hCJJRRngklSUAS5JRRngklSUAS5JRRngklSUAS5JRRngklTU/wO4VGl5i1pengAAAABJRU5ErkJggg==\n",
	"text/plain": [
	"<Figure size 432x288 with 1 Axes>"
	]
	},
	"metadata": {},
	"output_type": "display_data"
	}
	],
	"source": [
	"bin_sizes, , = plt.hist(df.year[non_numberic_profits], bins=range(1955, 2006))"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 117,
	"metadata": {},
	"outputs": [],
	"source": [
	"df = df.loc[~non_numberic_profits]\n",
	"df.profit = df.profit.apply(pd.to_numeric)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We should check that worked."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 118,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"25131"
	]
	},
	"execution_count": 118,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"len(df)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 120,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"year int64\n",
	"rank int64\n",
	"company object\n",
	"revenue float64\n",
	"profit float64\n",
	"dtype: object"
	]
	},
	"execution_count": 120,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"df.dtypes"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Great! We have finished our data set setup.\n",
	"\n",
	"If you were going to present your notebook as a report, you could get rid of the investigatory cells we created, which are included here as a demonstration of the flow of working with notebooks, and merge relevant cells (see the Advanced Functionality section below for more on this) to create a single data set setup cell. This would mean that if we ever mess up our data set elsewhere, we can just rerun the setup cell to restore it."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Sharing your notebooks\n",
	"When people talk of sharing their notebooks, there are generally two paradigms they may be considering. Most often, individuals share the end-result of their work, much like this article itself, which means sharing non-interactive ,pre-rendered versions of their notebooks; however, it is also possible collaborated on notebooks with the aid version control systems such as GIt. \n",
	"\n",
	"That said, there are some nascnet companies popping up on the web offering the ability to run interactive jupyter notebooks in the cloud. \n",
	"\n",
	"\n",
	"# Before you share\n",
	"A shared notebook will appear exactly in the state it was in when you export or save it, including the output of any code of any code cells. Therefore, to ensure that your notebook is share-ready, so to speak, there are a few steps you should take before sharing:\n",
	"\n",
	"\n",
	"\n",
	"1. Click \"cell \\> All Outpue \\> clear\"\n",
	"\n",
	"\n",
	"2. Click \"Kernel \\> Restart & Run All\"\n",
	"\n",
	"\n",
	"3. Wait for your code cells to finish executing and check they did so as expected\n",
	"\n",
	"\n",
	"\n",
	"Thisi will ensure your notebooks don't contain intermediary output, have a statle state, and executed in order at the time of sharing"
	]
	},
	{
	"cell_type": "raw",
	"metadata": {},
	"source": []
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python [default]",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.6.5"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}