Skip to content

Instantly share code, notes, and snippets.

@lepangdan
Created July 14, 2018 07:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lepangdan/26d0e622f0e0b9b8b822428520a340f3 to your computer and use it in GitHub Desktop.
Save lepangdan/26d0e622f0e0b9b8b822428520a340f3 to your computer and use it in GitHub Desktop.
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"% matplotlib inline\n",
"\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"sns.set(style='darkgrid')"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv('/home/pangdan/Desktop/fortune500.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Show some data in DataFrame structure"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Year</th>\n",
" <th>Rank</th>\n",
" <th>Company</th>\n",
" <th>Revenue (in millions)</th>\n",
" <th>Profit (in millions)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1955</td>\n",
" <td>1</td>\n",
" <td>General Motors</td>\n",
" <td>9823.5</td>\n",
" <td>806</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1955</td>\n",
" <td>2</td>\n",
" <td>Exxon Mobil</td>\n",
" <td>5661.4</td>\n",
" <td>584.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1955</td>\n",
" <td>3</td>\n",
" <td>U.S. Steel</td>\n",
" <td>3250.4</td>\n",
" <td>195.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1955</td>\n",
" <td>4</td>\n",
" <td>General Electric</td>\n",
" <td>2959.1</td>\n",
" <td>212.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1955</td>\n",
" <td>5</td>\n",
" <td>Esmark</td>\n",
" <td>2510.8</td>\n",
" <td>19.1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Year Rank Company Revenue (in millions) Profit (in millions)\n",
"0 1955 1 General Motors 9823.5 806\n",
"1 1955 2 Exxon Mobil 5661.4 584.8\n",
"2 1955 3 U.S. Steel 3250.4 195.4\n",
"3 1955 4 General Electric 2959.1 212.6\n",
"4 1955 5 Esmark 2510.8 19.1"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Year</th>\n",
" <th>Rank</th>\n",
" <th>Company</th>\n",
" <th>Revenue (in millions)</th>\n",
" <th>Profit (in millions)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>25495</th>\n",
" <td>2005</td>\n",
" <td>496</td>\n",
" <td>Wm. Wrigley Jr.</td>\n",
" <td>3648.6</td>\n",
" <td>493</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25496</th>\n",
" <td>2005</td>\n",
" <td>497</td>\n",
" <td>Peabody Energy</td>\n",
" <td>3631.6</td>\n",
" <td>175.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25497</th>\n",
" <td>2005</td>\n",
" <td>498</td>\n",
" <td>Wendy's International</td>\n",
" <td>3630.4</td>\n",
" <td>57.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25498</th>\n",
" <td>2005</td>\n",
" <td>499</td>\n",
" <td>Kindred Healthcare</td>\n",
" <td>3616.6</td>\n",
" <td>70.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25499</th>\n",
" <td>2005</td>\n",
" <td>500</td>\n",
" <td>Cincinnati Financial</td>\n",
" <td>3614.0</td>\n",
" <td>584</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Year Rank Company Revenue (in millions) \\\n",
"25495 2005 496 Wm. Wrigley Jr. 3648.6 \n",
"25496 2005 497 Peabody Energy 3631.6 \n",
"25497 2005 498 Wendy's International 3630.4 \n",
"25498 2005 499 Kindred Healthcare 3616.6 \n",
"25499 2005 500 Cincinnati Financial 3614.0 \n",
"\n",
" Profit (in millions) \n",
"25495 493 \n",
"25496 175.4 \n",
"25497 57.8 \n",
"25498 70.6 \n",
"25499 584 "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.tail()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Rename those colums so we can refer to them later."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"df.columns = ['year', 'rank', 'company', 'revenue', 'profit']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Exlore our data set, Is it complete? Did pandas read it as expected? Are any values missing?"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"25500"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Check whether our data set has been imported as we would expect. A simple check is to see if the data types(or dtypes) have been correctly interpreted."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"year int64\n",
"rank int64\n",
"company object\n",
"revenue float64\n",
"profit object\n",
"dtype: object"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Uh,oh. It looks like there's something wrong with the profits column -- we would expect it to be a float64 like the revenue column. This indicates that it probably contains some noninteger values, so let's take a look."
]
},
{
"cell_type": "code",
"execution_count": 112,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>rank</th>\n",
" <th>company</th>\n",
" <th>revenue</th>\n",
" <th>profit</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>228</th>\n",
" <td>1955</td>\n",
" <td>229</td>\n",
" <td>Norton</td>\n",
" <td>135.0</td>\n",
" <td>N.A.</td>\n",
" </tr>\n",
" <tr>\n",
" <th>290</th>\n",
" <td>1955</td>\n",
" <td>291</td>\n",
" <td>Schlitz Brewing</td>\n",
" <td>100.0</td>\n",
" <td>N.A.</td>\n",
" </tr>\n",
" <tr>\n",
" <th>294</th>\n",
" <td>1955</td>\n",
" <td>295</td>\n",
" <td>Pacific Vegetable Oil</td>\n",
" <td>97.9</td>\n",
" <td>N.A.</td>\n",
" </tr>\n",
" <tr>\n",
" <th>296</th>\n",
" <td>1955</td>\n",
" <td>297</td>\n",
" <td>Liebmann Breweries</td>\n",
" <td>96.0</td>\n",
" <td>N.A.</td>\n",
" </tr>\n",
" <tr>\n",
" <th>352</th>\n",
" <td>1955</td>\n",
" <td>353</td>\n",
" <td>Minneapolis-Moline</td>\n",
" <td>77.4</td>\n",
" <td>N.A.</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" year rank company revenue profit\n",
"228 1955 229 Norton 135.0 N.A.\n",
"290 1955 291 Schlitz Brewing 100.0 N.A.\n",
"294 1955 295 Pacific Vegetable Oil 97.9 N.A.\n",
"296 1955 297 Liebmann Breweries 96.0 N.A.\n",
"352 1955 353 Minneapolis-Moline 77.4 N.A."
]
},
"execution_count": 112,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"non_numberic_profits = df.profit.str.contains('[^0-9.-]')\n",
"df.loc[non_numberic_profits].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That makes it easy to interpret, but what should we do? Well, that dependes how many values ar missing."
]
},
{
"cell_type": "code",
"execution_count": 115,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"369"
]
},
"execution_count": 115,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(df.profit[non_numberic_profits])"
]
},
{
"cell_type": "code",
"execution_count": 116,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD7CAYAAABzGc+QAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAADzBJREFUeJzt3X+M5PVdx/Hn9RZNgW1cwh6cSHI1oe8UkUJBSoJRhBTRggcxIJTQUwiggaQ1F82FmNCIxvuDQ0lsWrFc70gRIQLhh9iTHGlJrRKFUDiFd2zIhZ5cuEVWOb2Y9mD9Y75blmNnd2Z2dmbfO89HcrnZ78x85/Oe7+xrPvv5fr7f75qZmRkkSfV8aNgNkCT1xgCXpKIMcEkqygCXpKIMcEkqygCXpKLGFntARJwM3AucCLwL3J2Zd0XEF4EbgKnmobdm5pPL1VBJ0vutWWweeESsB9Zn5vMRMQ48B1wGXAn8T2besfzNlCQdadEeeGbuB/Y3tw9GxMvASb282NTUwZJHDU1MHM309KFhN2NgRq1esOZRUbXmycnxNfMt72oMPCI2AGcCzzaLbomIFyNie0RMLK2JK9fY2NphN2GgRq1esOZRsdpqXnQIZVZEHAt8C/jjzHw4Ik4A3gRmgNtpDbNct9A6Dh9+Z2a1vYGSNADz9sA7CvCIOAp4AtiVmXfOc/8G4InMPG2h9VQdQpmcHGdq6uCwmzEwo1YvWPOoqFpzz0MoEbEGuAd4eW54Nzs3Z10O7FlqIyVJnVt0JyZwHnAt8FJEvNAsuxW4OiLOoDWEshe4aVlaKEmaVyezUL7N/OMvzvmWpCHySExJKsoAl6SiDHBJKsoAl6SiOpmFIqmw67Y+Pe/yx7dtHHBL1G/2wCWpKANckooywCWpKANckooywCWpKANckooywCWpKOeBSytUu/nb7WzfcsEytUQrlT1wSSrKAJekogxwSSrKAJekogxwSSrKAJekogxwSSrKAJekogxwSSrKAJekogxwSSrKAJekogxwSSrKAJekogxwSSrKAJekogxwSSrKK/JI6ki7KwR5JaDhsQcuSUUZ4JJUlAEuSUUtOgYeEScD9wInAu8Cd2fmXRFxHPAAsAHYC1yZmdPL11RJ0lyd9MAPA5sz8+PAucDNEXEqsAXYnZmnALubnyVJA7JogGfm/sx8vrl9EHgZOAnYCOxsHrYTuGy5GilJ+qCuxsAjYgNwJvAscEJm7odWyAPr+t46SVJbHc8Dj4hjgYeAL2Tm2xHR9YtNTBzN2Njarp+3EkxOjg+7CQM1avVC/ZrbzdNeyHw1X7r50SWvYyWr1t6FdBTgEXEUrfC+LzMfbha/ERHrM3N/RKwHDiy2nunpQ723dIgmJ8eZmjo47GYMzKjVC6NZM9CXmiu9b1W3c7svnUWHUCJiDXAP8HJm3jnnrseATc3tTUB3X9uSpCXppAd+HnAt8FJEvNAsuxXYCjwYEdcDrwFXLE8TJUnzWTTAM/PbwJo2d1/Y3+ZIkjrlkZiSVJQBLklFGeCSVJQBLklFeUEHaUR1e8COVh574JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJUlAEuSUUZ4JJU1NiwGyCptuu2Pt31c7ZvuWAZWjJ67IFLUlEGuCQVZYBLUlGLjoFHxHbgEuBAZp7WLPsicAMw1Tzs1sx8crkaKUn6oE52Yu4A/hy494jlf5qZd/S9RZKkjiw6hJKZzwBvDaAtkqQuLGUM/JaIeDEitkfERN9aJEnqSK/zwL8M3A7MNP9vA65b7EkTE0czNra2x5ccrsnJ8WE3YaBGrV4YzZqHZZjv9Wrazj0FeGa+MXs7Iv4SeKKT501PH+rl5YZucnKcqamDw27GwIxavTCaNQ/TsN7rqtu53ZdOT0MoEbF+zo+XA3t6WY8kqXedTCO8HzgfOD4i9gG3AedHxBm0hlD2AjctYxslSfNYNMAz8+p5Ft+zDG2RJHXBIzElqSgDXJKKMsAlqSgDXJKKMsAlqSgDXJKKMsAlqSgDXJKKMsAlqSgDXJKKMsAlqSgDXJKKMsAlqaher8gjqU+u2/r0sJugouyBS1JRBrgkFWWAS1JRBrgkFWWAS1JRBrgkFWWAS1JRBrgkFWWAS1JRBrgkFWWAS1JRBrgkFWWAS1JRBrgkFWWAS1JRBrgkFeUFHaQB8cIN6jd74JJUlAEuSUUZ4JJUlAEuSUUtuhMzIrYDlwAHMvO0ZtlxwAPABmAvcGVmTi9fMyVJR+qkB74DuPiIZVuA3Zl5CrC7+VmSNECLBnhmPgO8dcTijcDO5vZO4LI+t0uStIhex8BPyMz9AM3/6/rXJElSJwZ6IM/ExNGMja0d5Ev2zeTk+LCbMFCjVi90X/Olmx+dd/nj2zb2ozmr2jA/X6vps91rgL8REeszc39ErAcOdPKk6elDPb7ccE1OjjM1dXDYzRiYUasX+lvzqL13vRjWe1T1s93uS6fXIZTHgE3N7U3A/F0RSdKy6WQa4f3A+cDxEbEPuA3YCjwYEdcDrwFXLGcjJUkftGiAZ+bVbe66sM9tkSR1wSMxJakoA1ySijLAJakoL+igVanbiyf0c+62F27QoNgDl6SiDHBJKsoAl6SiDHBJKsoAl6SiDHBJKsoAl6SiDHBJKsoDedQX7Q5e2b7lggG3pL88KEcrmT1wSSrKAJekogxwSSrKAJekogxwSSrKAJekogxwSSrKeeDqSrfzort9fLt548s9H/vSzY8u6/ql5WAPXJKKMsAlqSgDXJKKMsAlqSgDXJKKMsAlqSgDXJKKMsAlqagyB/Ks1gsG6P28gMJo6NcBXqPOHrgkFWWAS1JRBrgkFWWAS1JRS9qJGRF7gYPAO8DhzDy7D22SJHWgH7NQfikz3+zDeiRJXXAIRZKKWmqAzwB/HxHPRcSN/WiQJKkzSx1COS8zX4+IdcBTEfFKZj7T7sETE0czNrZ2iS/5fpOT431d37BfZ6UYtXq1svXz89iPdbW7gtPj2zYued3dWFKAZ+brzf8HIuIR4BygbYBPTx9aysvNa2rqYN/XeaTJyfGBvM5KMWr1auXr1+dxuT/by7Xudl86PQ+hRMQxETE+exu4CNjT6/okSd1ZSg/8BOCRiJhdz19l5jf60ipJ0qJ6DvDMfBX4RB/bIknqgtMIJakoA1ySijLAJamoMhd0kDS6+nVBl3bzt9utZ6VfYMQeuCQVZYBLUlEGuCQVZYBLUlEGuCQVZYBLUlEGuCQVZYBLUlGr9kCefk7AH/RJ2iV1pl+/5yv9gJ127IFLUlEGuCQVZYBLUlEGuCQVZYBLUlEGuCQVZYBLUlEGuCQVVf5AnmFOwO/2tbu9eogkLcQeuCQVZYBLUlEGuCQVZYBLUlEGuCQVZYBLUlEGuCQVVX4e+CBcuvnRobxuu3nmzieXVqaFjg1Zjt9be+CSVJQBLklFGeCSVJQBLklFLWknZkRcDNwFrAW+mplb+9IqSdKieu6BR8Ra4EvArwCnAldHxKn9apgkaWFLGUI5B/heZr6amT8A/hrY2J9mSZIWs5QAPwn4/pyf9zXLJEkDsJQx8DXzLJtZ6AmTk+PzPacjj28bvc79MGuenByfd/kobgdppVpKD3wfcPKcn38KeH1pzZEkdWopPfB/Bk6JiI8C/wFcBXy2L62SJC2q5x54Zh4GbgF2AS8DD2bmv/arYZKkha2ZmVlw2FqStEJ5JKYkFWWAS1JRI3k+8IjYDlwCHMjM05plnwC+AhwL7AWuycy3m/tOB/4C+AjwLvBzmfl/EXEWsAP4MPAk8PnMXJFjUt3UHBHXAL835+mnA5/MzBdWcc1HAV8FPknr9+LezPyT5jllThnRZc0/RutzfTatz/XnM/ObzXNKbOeIOBm4FziRVg13Z+ZdEXEc8ACwgVbNV2bmdESsobUtfxU4BPxmZj7frGsT8AfNqv8oM3cOspZejGoPfAdw8RHLvgpsycyfBR6hCbCIGAO+Dvx2Zv4McD7ww+Y5XwZuBE5p/h25zpVkBx3WnJn3ZeYZmXkGcC2wNzNfaJ6zKmsGrgB+vFl+FnBTRGwoeMqIHXRe8w0AzfJPA9siYjYTqmznw8DmzPw4cC5wc7N9tgC7M/MUYHfzM7S242xNN9KqkybwbwM+Reso89siYmKQhfRiJAM8M58B3jpicQDPNLefAn69uX0R8GJmfrd57n9m5jsRsR74SGb+Y9MzuRe4bPlb35sua57rauB+gFVe8wxwTPOF/WHgB8DbFDtlRJc1n0or3MjMA8B/AWdX2s6ZuX+2B52ZB2nNiDuJ1jaa7UHv5L32b6T119VMZv4T8BNNvb8MPJWZb2XmNK33aaV+af3ISAZ4G3uAX2tuX8F7Byl9DJiJiF0R8XxE/H6z/CRaBzPNqngqgXY1z/UbNAHO6q75b4D/BfYDrwF3ZOZbrI5TRrSr+bvAxogYa47nOKu5r+R2jogNwJnAs8AJmbkfWiEPrGse1m57ltzOBvh7rqP159dzwDitHhi0xkN/Hrim+f/yiLiQHk4lsAK1qxmAiPgUcCgz9zSLVnPN5wDvAD8JfBTYHBE/zequeTutoPoX4M+A79AakihXc0QcCzwEfGF231Ub7WorVzOM6E7M+WTmK7SGS4iIjwGfae7aB3wrM99s7nuS1o6ur9M6fcCscqcSWKDmWVfxXu8bWu/Faq35s8A3MvOHwIGI+AdaO/e+T/FTRrSruTkY73dnHxcR3wH+HZim0HZudkA/BNyXmQ83i9+IiPWZub8ZIjnQLG93CpB9tPZvzV3+zeVsdz/YA29ExLrm/w/R2hP9leauXcDpEXF0Mz76i8C/NX+WHYyIc5s9258DhnP5+h4tUPPssitojfkCP/pTdLXW/BpwQUSsiYhjaO0Qe4U5p4xoZm1cBTw2+Jb3rl3NzWf6mOb2p4HDmVnqs9207x7g5cy8c85djwGbmtubeK/9jwGfa7bzucB/N/XuAi6KiIlm5+VFzbIVbSR74BFxP61v2+MjYh+tvc/HRsTNzUMeBr4G0Ew9upPWL/IM8GRm/m3zuN/hvalWf9f8W5G6qbnxC8C+zHz1iFWt1pq/1NzeQ+vP6a9l5ovNemZPGbEW2L6STxnRZc3rgF0R8S6t8xldO2dVVbbzebTa/VJEzM6UuhXYCjwYEdfT+nK+ornvSVpTCL9HaxrhbwFk5lsRcTut33OAP2z2gaxoHkovSUU5hCJJRRngklSUAS5JRRngklSUAS5JRRngklSUAS5JRRngklTU/wO4VGl5i1pengAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"bin_sizes, , = plt.hist(df.year[non_numberic_profits], bins=range(1955, 2006))"
]
},
{
"cell_type": "code",
"execution_count": 117,
"metadata": {},
"outputs": [],
"source": [
"df = df.loc[~non_numberic_profits]\n",
"df.profit = df.profit.apply(pd.to_numeric)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We should check that worked."
]
},
{
"cell_type": "code",
"execution_count": 118,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"25131"
]
},
"execution_count": 118,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(df)"
]
},
{
"cell_type": "code",
"execution_count": 120,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"year int64\n",
"rank int64\n",
"company object\n",
"revenue float64\n",
"profit float64\n",
"dtype: object"
]
},
"execution_count": 120,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Great! We have finished our data set setup.\n",
"\n",
"If you were going to present your notebook as a report, you could get rid of the investigatory cells we created, which are included here as a demonstration of the flow of working with notebooks, and merge relevant cells (see the Advanced Functionality section below for more on this) to create a single data set setup cell. This would mean that if we ever mess up our data set elsewhere, we can just rerun the setup cell to restore it."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sharing your notebooks\n",
"When people talk of sharing their notebooks, there are generally two paradigms they may be considering. Most often, individuals share the end-result of their work, much like this article itself, which means sharing non-interactive ,pre-rendered versions of their notebooks; however, it is also possible collaborated on notebooks with the aid version control systems such as GIt. \n",
"\n",
"That said, there are some nascnet companies popping up on the web offering the ability to run interactive jupyter notebooks in the cloud. \n",
"\n",
"\n",
"# Before you share\n",
"A shared notebook will appear exactly in the state it was in when you export or save it, including the output of any code of any code cells. Therefore, to ensure that your notebook is share-ready, so to speak, there are a few steps you should take before sharing:\n",
"\n",
"\n",
"\n",
"1. Click \"cell \\> All Outpue \\> clear\"\n",
"\n",
"\n",
"2. Click \"Kernel \\> Restart & Run All\"\n",
"\n",
"\n",
"3. Wait for your code cells to finish executing and check they did so as expected\n",
"\n",
"\n",
"\n",
"Thisi will ensure your notebooks don't contain intermediary output, have a statle state, and executed in order at the time of sharing"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment