Created
April 24, 2014 16:28
-
-
Save stared/11260701 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"metadata": { | |
"name": "", | |
"signature": "sha256:9172d64435de94c4a27d90016e6b1e8a1a05f65a6f34fc226525e79ac31c7c90" | |
}, | |
"nbformat": 3, | |
"nbformat_minor": 0, | |
"worksheets": [ | |
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Introduction to Pandas \n", | |
"\n", | |
"* Piotr Migda\u0142, http://migdal.wikidot.com\n", | |
"* The Barcelona Python Meetup Group: [Python & Sciences](http://www.meetup.com/python-185/events/169870182/) (24 April 2014)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# [Pandas](http://pandas.pydata.org/)\n", | |
"\n", | |
"* 0.13.1 released (February 3, 2014)\n", | |
"\n", | |
" pandas is an open source, BSD-licensed library\n", | |
" providing high-performance, easy-to-use data structures\n", | |
" and data analysis tools for the Python programming language\n", | |
" \n", | |
"In practice, Pandas brings **R**-like data structures, great for working with **tabular data**. Tabular?\n", | |
"\n", | |
"* Everything that goes into **SQL**, **Excel** or **CSV**.\n", | |
"* Not all data, but data you typically work with.\n", | |
"\n", | |
"Both for data exploration and production.\n", | |
"\n", | |
"If you can do it with SQL or Excel, you can do it with Pandas!\n", | |
"\n", | |
"(Counterexamples?)\n", | |
"\n", | |
"So:\n", | |
"\n", | |
" $ pip install pandas" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"import pandas as pd" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 141 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Series\n", | |
"\n", | |
"* Creating\n", | |
"* Adding\n", | |
"* Concatenating\n", | |
"* Filtering\n", | |
"* Applying" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser = pd.Series(['a','b','c', 'd', 'e'])" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 142 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 143, | |
"text": [ | |
"0 a\n", | |
"1 b\n", | |
"2 c\n", | |
"3 d\n", | |
"4 e\n", | |
"dtype: object" | |
] | |
} | |
], | |
"prompt_number": 143 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser[1]" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 144, | |
"text": [ | |
"'b'" | |
] | |
} | |
], | |
"prompt_number": 144 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser[9] = 'qqq'" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 145 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 146, | |
"text": [ | |
"0 a\n", | |
"1 b\n", | |
"2 c\n", | |
"3 d\n", | |
"4 e\n", | |
"9 qqq\n", | |
"dtype: object" | |
] | |
} | |
], | |
"prompt_number": 146 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser['ten'] = 'abc'" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 147 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 148, | |
"text": [ | |
"0 a\n", | |
"1 b\n", | |
"2 c\n", | |
"3 d\n", | |
"4 e\n", | |
"9 qqq\n", | |
"ten abc\n", | |
"dtype: object" | |
] | |
} | |
], | |
"prompt_number": 148 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# ser.loc[k] accesses object with index k (can be integer, string or timestamp)\n", | |
"ser.loc['ten']" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 149, | |
"text": [ | |
"'abc'" | |
] | |
} | |
], | |
"prompt_number": 149 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# ser.iloc[i] - accessed i-th object; i in range(0, len(ser))\n", | |
"ser.iloc[4]" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 150, | |
"text": [ | |
"'e'" | |
] | |
} | |
], | |
"prompt_number": 150 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser != 'b'" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 151, | |
"text": [ | |
"0 True\n", | |
"1 False\n", | |
"2 True\n", | |
"3 True\n", | |
"4 True\n", | |
"9 True\n", | |
"ten True\n", | |
"dtype: bool" | |
] | |
} | |
], | |
"prompt_number": 151 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser[ser != 'b']" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 152, | |
"text": [ | |
"0 a\n", | |
"2 c\n", | |
"3 d\n", | |
"4 e\n", | |
"9 qqq\n", | |
"ten abc\n", | |
"dtype: object" | |
] | |
} | |
], | |
"prompt_number": 152 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser = ser.append(pd.Series([8,2,1],\n", | |
" index=[10,100,1000]))" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 153 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 154, | |
"text": [ | |
"0 a\n", | |
"1 b\n", | |
"2 c\n", | |
"3 d\n", | |
"4 e\n", | |
"9 qqq\n", | |
"ten abc\n", | |
"10 8\n", | |
"100 2\n", | |
"1000 1\n", | |
"dtype: object" | |
] | |
} | |
], | |
"prompt_number": 154 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser.describe()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 155, | |
"text": [ | |
"count 10\n", | |
"unique 10\n", | |
"top a\n", | |
"freq 1\n", | |
"dtype: object" | |
] | |
} | |
], | |
"prompt_number": 155 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Data frame\n", | |
"\n", | |
"* Creating\n", | |
"* Relations to Series; columns and index\n", | |
"* columns, loc, iloc, ix\n", | |
"* Iterating\n", | |
"* Filtering\n", | |
"* Joins\n", | |
"* Groupby\n", | |
"* Apply, axes\n", | |
"* Renaming things\n", | |
"* Dealings with NaNs\n", | |
"* Data types: (numpy) float, int, obj" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"import numpy as np" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 156 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df = pd.DataFrame(np.random.randn(6,4),\n", | |
" columns=[\"random\", \"guess\", \"chance\", \"luck\"])" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 157 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>random</th>\n", | |
" <th>guess</th>\n", | |
" <th>chance</th>\n", | |
" <th>luck</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>-0.692656</td>\n", | |
" <td> 0.437082</td>\n", | |
" <td>-0.651568</td>\n", | |
" <td>-0.380875</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>-0.070536</td>\n", | |
" <td>-1.224517</td>\n", | |
" <td>-0.250073</td>\n", | |
" <td> 0.848358</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>-0.711452</td>\n", | |
" <td> 0.510971</td>\n", | |
" <td>-0.224961</td>\n", | |
" <td> 1.875583</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>-0.513751</td>\n", | |
" <td>-2.484004</td>\n", | |
" <td>-1.501159</td>\n", | |
" <td>-2.113281</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td> 0.001650</td>\n", | |
" <td> 1.963386</td>\n", | |
" <td>-0.028636</td>\n", | |
" <td>-1.440029</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td> 0.296459</td>\n", | |
" <td>-0.391263</td>\n", | |
" <td> 0.969736</td>\n", | |
" <td> 1.595289</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>6 rows \u00d7 4 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 158, | |
"text": [ | |
" random guess chance luck\n", | |
"0 -0.692656 0.437082 -0.651568 -0.380875\n", | |
"1 -0.070536 -1.224517 -0.250073 0.848358\n", | |
"2 -0.711452 0.510971 -0.224961 1.875583\n", | |
"3 -0.513751 -2.484004 -1.501159 -2.113281\n", | |
"4 0.001650 1.963386 -0.028636 -1.440029\n", | |
"5 0.296459 -0.391263 0.969736 1.595289\n", | |
"\n", | |
"[6 rows x 4 columns]" | |
] | |
} | |
], | |
"prompt_number": 158 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df[\"luck\"]" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 159, | |
"text": [ | |
"0 -0.380875\n", | |
"1 0.848358\n", | |
"2 1.875583\n", | |
"3 -2.113281\n", | |
"4 -1.440029\n", | |
"5 1.595289\n", | |
"Name: luck, dtype: float64" | |
] | |
} | |
], | |
"prompt_number": 159 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df[\"luck\"] + 1" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 160, | |
"text": [ | |
"0 0.619125\n", | |
"1 1.848358\n", | |
"2 2.875583\n", | |
"3 -1.113281\n", | |
"4 -0.440029\n", | |
"5 2.595289\n", | |
"Name: luck, dtype: float64" | |
] | |
} | |
], | |
"prompt_number": 160 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df[\"fate\"] = 2 * df[\"luck\"] - df[\"random\"]" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 161 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>random</th>\n", | |
" <th>guess</th>\n", | |
" <th>chance</th>\n", | |
" <th>luck</th>\n", | |
" <th>fate</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>-0.692656</td>\n", | |
" <td> 0.437082</td>\n", | |
" <td>-0.651568</td>\n", | |
" <td>-0.380875</td>\n", | |
" <td>-0.069093</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>-0.070536</td>\n", | |
" <td>-1.224517</td>\n", | |
" <td>-0.250073</td>\n", | |
" <td> 0.848358</td>\n", | |
" <td> 1.767253</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>-0.711452</td>\n", | |
" <td> 0.510971</td>\n", | |
" <td>-0.224961</td>\n", | |
" <td> 1.875583</td>\n", | |
" <td> 4.462618</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>-0.513751</td>\n", | |
" <td>-2.484004</td>\n", | |
" <td>-1.501159</td>\n", | |
" <td>-2.113281</td>\n", | |
" <td>-3.712812</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td> 0.001650</td>\n", | |
" <td> 1.963386</td>\n", | |
" <td>-0.028636</td>\n", | |
" <td>-1.440029</td>\n", | |
" <td>-2.881709</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td> 0.296459</td>\n", | |
" <td>-0.391263</td>\n", | |
" <td> 0.969736</td>\n", | |
" <td> 1.595289</td>\n", | |
" <td> 2.894120</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>6 rows \u00d7 5 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 162, | |
"text": [ | |
" random guess chance luck fate\n", | |
"0 -0.692656 0.437082 -0.651568 -0.380875 -0.069093\n", | |
"1 -0.070536 -1.224517 -0.250073 0.848358 1.767253\n", | |
"2 -0.711452 0.510971 -0.224961 1.875583 4.462618\n", | |
"3 -0.513751 -2.484004 -1.501159 -2.113281 -3.712812\n", | |
"4 0.001650 1.963386 -0.028636 -1.440029 -2.881709\n", | |
"5 0.296459 -0.391263 0.969736 1.595289 2.894120\n", | |
"\n", | |
"[6 rows x 5 columns]" | |
] | |
} | |
], | |
"prompt_number": 162 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df[\"random\"] = df[\"random\"].apply(lambda x: 'a' if x > 0 else 'b')" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 163 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>random</th>\n", | |
" <th>guess</th>\n", | |
" <th>chance</th>\n", | |
" <th>luck</th>\n", | |
" <th>fate</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td> b</td>\n", | |
" <td> 0.437082</td>\n", | |
" <td>-0.651568</td>\n", | |
" <td>-0.380875</td>\n", | |
" <td>-0.069093</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td> b</td>\n", | |
" <td>-1.224517</td>\n", | |
" <td>-0.250073</td>\n", | |
" <td> 0.848358</td>\n", | |
" <td> 1.767253</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td> b</td>\n", | |
" <td> 0.510971</td>\n", | |
" <td>-0.224961</td>\n", | |
" <td> 1.875583</td>\n", | |
" <td> 4.462618</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td> b</td>\n", | |
" <td>-2.484004</td>\n", | |
" <td>-1.501159</td>\n", | |
" <td>-2.113281</td>\n", | |
" <td>-3.712812</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td> a</td>\n", | |
" <td> 1.963386</td>\n", | |
" <td>-0.028636</td>\n", | |
" <td>-1.440029</td>\n", | |
" <td>-2.881709</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td> a</td>\n", | |
" <td>-0.391263</td>\n", | |
" <td> 0.969736</td>\n", | |
" <td> 1.595289</td>\n", | |
" <td> 2.894120</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>6 rows \u00d7 5 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 164, | |
"text": [ | |
" random guess chance luck fate\n", | |
"0 b 0.437082 -0.651568 -0.380875 -0.069093\n", | |
"1 b -1.224517 -0.250073 0.848358 1.767253\n", | |
"2 b 0.510971 -0.224961 1.875583 4.462618\n", | |
"3 b -2.484004 -1.501159 -2.113281 -3.712812\n", | |
"4 a 1.963386 -0.028636 -1.440029 -2.881709\n", | |
"5 a -0.391263 0.969736 1.595289 2.894120\n", | |
"\n", | |
"[6 rows x 5 columns]" | |
] | |
} | |
], | |
"prompt_number": 164 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df.ix[1:3, \"guess\":\"luck\"]" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>guess</th>\n", | |
" <th>chance</th>\n", | |
" <th>luck</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>-1.224517</td>\n", | |
" <td>-0.250073</td>\n", | |
" <td> 0.848358</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td> 0.510971</td>\n", | |
" <td>-0.224961</td>\n", | |
" <td> 1.875583</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>-2.484004</td>\n", | |
" <td>-1.501159</td>\n", | |
" <td>-2.113281</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>3 rows \u00d7 3 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 165, | |
"text": [ | |
" guess chance luck\n", | |
"1 -1.224517 -0.250073 0.848358\n", | |
"2 0.510971 -0.224961 1.875583\n", | |
"3 -2.484004 -1.501159 -2.113281\n", | |
"\n", | |
"[3 rows x 3 columns]" | |
] | |
} | |
], | |
"prompt_number": 165 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"a_column = pd.DataFrame(np.random.randn(3,1),\n", | |
" index=[1,4,9],\n", | |
" columns=[\"new\"])" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 166 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"a_column" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>new</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td> 0.235703</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>-1.211161</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td> 0.334221</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>3 rows \u00d7 1 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 167, | |
"text": [ | |
" new\n", | |
"1 0.235703\n", | |
"4 -1.211161\n", | |
"9 0.334221\n", | |
"\n", | |
"[3 rows x 1 columns]" | |
] | |
} | |
], | |
"prompt_number": 167 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df = pd.concat([df, a_column], axis=1)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 168 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>random</th>\n", | |
" <th>guess</th>\n", | |
" <th>chance</th>\n", | |
" <th>luck</th>\n", | |
" <th>fate</th>\n", | |
" <th>new</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td> b</td>\n", | |
" <td> 0.437082</td>\n", | |
" <td>-0.651568</td>\n", | |
" <td>-0.380875</td>\n", | |
" <td>-0.069093</td>\n", | |
" <td> NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td> b</td>\n", | |
" <td>-1.224517</td>\n", | |
" <td>-0.250073</td>\n", | |
" <td> 0.848358</td>\n", | |
" <td> 1.767253</td>\n", | |
" <td> 0.235703</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td> b</td>\n", | |
" <td> 0.510971</td>\n", | |
" <td>-0.224961</td>\n", | |
" <td> 1.875583</td>\n", | |
" <td> 4.462618</td>\n", | |
" <td> NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td> b</td>\n", | |
" <td>-2.484004</td>\n", | |
" <td>-1.501159</td>\n", | |
" <td>-2.113281</td>\n", | |
" <td>-3.712812</td>\n", | |
" <td> NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td> a</td>\n", | |
" <td> 1.963386</td>\n", | |
" <td>-0.028636</td>\n", | |
" <td>-1.440029</td>\n", | |
" <td>-2.881709</td>\n", | |
" <td>-1.211161</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td> a</td>\n", | |
" <td>-0.391263</td>\n", | |
" <td> 0.969736</td>\n", | |
" <td> 1.595289</td>\n", | |
" <td> 2.894120</td>\n", | |
" <td> NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td> NaN</td>\n", | |
" <td> NaN</td>\n", | |
" <td> NaN</td>\n", | |
" <td> NaN</td>\n", | |
" <td> NaN</td>\n", | |
" <td> 0.334221</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>7 rows \u00d7 6 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 169, | |
"text": [ | |
" random guess chance luck fate new\n", | |
"0 b 0.437082 -0.651568 -0.380875 -0.069093 NaN\n", | |
"1 b -1.224517 -0.250073 0.848358 1.767253 0.235703\n", | |
"2 b 0.510971 -0.224961 1.875583 4.462618 NaN\n", | |
"3 b -2.484004 -1.501159 -2.113281 -3.712812 NaN\n", | |
"4 a 1.963386 -0.028636 -1.440029 -2.881709 -1.211161\n", | |
"5 a -0.391263 0.969736 1.595289 2.894120 NaN\n", | |
"9 NaN NaN NaN NaN NaN 0.334221\n", | |
"\n", | |
"[7 rows x 6 columns]" | |
] | |
} | |
], | |
"prompt_number": 169 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df[df[\"chance\"] < 0]" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>random</th>\n", | |
" <th>guess</th>\n", | |
" <th>chance</th>\n", | |
" <th>luck</th>\n", | |
" <th>fate</th>\n", | |
" <th>new</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td> b</td>\n", | |
" <td> 0.437082</td>\n", | |
" <td>-0.651568</td>\n", | |
" <td>-0.380875</td>\n", | |
" <td>-0.069093</td>\n", | |
" <td> NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td> b</td>\n", | |
" <td>-1.224517</td>\n", | |
" <td>-0.250073</td>\n", | |
" <td> 0.848358</td>\n", | |
" <td> 1.767253</td>\n", | |
" <td> 0.235703</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td> b</td>\n", | |
" <td> 0.510971</td>\n", | |
" <td>-0.224961</td>\n", | |
" <td> 1.875583</td>\n", | |
" <td> 4.462618</td>\n", | |
" <td> NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td> b</td>\n", | |
" <td>-2.484004</td>\n", | |
" <td>-1.501159</td>\n", | |
" <td>-2.113281</td>\n", | |
" <td>-3.712812</td>\n", | |
" <td> NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td> a</td>\n", | |
" <td> 1.963386</td>\n", | |
" <td>-0.028636</td>\n", | |
" <td>-1.440029</td>\n", | |
" <td>-2.881709</td>\n", | |
" <td>-1.211161</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>5 rows \u00d7 6 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 170, | |
"text": [ | |
" random guess chance luck fate new\n", | |
"0 b 0.437082 -0.651568 -0.380875 -0.069093 NaN\n", | |
"1 b -1.224517 -0.250073 0.848358 1.767253 0.235703\n", | |
"2 b 0.510971 -0.224961 1.875583 4.462618 NaN\n", | |
"3 b -2.484004 -1.501159 -2.113281 -3.712812 NaN\n", | |
"4 a 1.963386 -0.028636 -1.440029 -2.881709 -1.211161\n", | |
"\n", | |
"[5 rows x 6 columns]" | |
] | |
} | |
], | |
"prompt_number": 170 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df[\"guess\"][df[\"chance\"] < 0] = 42" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 171 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>random</th>\n", | |
" <th>guess</th>\n", | |
" <th>chance</th>\n", | |
" <th>luck</th>\n", | |
" <th>fate</th>\n", | |
" <th>new</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td> b</td>\n", | |
" <td> 42.000000</td>\n", | |
" <td>-0.651568</td>\n", | |
" <td>-0.380875</td>\n", | |
" <td>-0.069093</td>\n", | |
" <td> NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td> b</td>\n", | |
" <td> 42.000000</td>\n", | |
" <td>-0.250073</td>\n", | |
" <td> 0.848358</td>\n", | |
" <td> 1.767253</td>\n", | |
" <td> 0.235703</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td> b</td>\n", | |
" <td> 42.000000</td>\n", | |
" <td>-0.224961</td>\n", | |
" <td> 1.875583</td>\n", | |
" <td> 4.462618</td>\n", | |
" <td> NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td> b</td>\n", | |
" <td> 42.000000</td>\n", | |
" <td>-1.501159</td>\n", | |
" <td>-2.113281</td>\n", | |
" <td>-3.712812</td>\n", | |
" <td> NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td> a</td>\n", | |
" <td> 42.000000</td>\n", | |
" <td>-0.028636</td>\n", | |
" <td>-1.440029</td>\n", | |
" <td>-2.881709</td>\n", | |
" <td>-1.211161</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td> a</td>\n", | |
" <td> -0.391263</td>\n", | |
" <td> 0.969736</td>\n", | |
" <td> 1.595289</td>\n", | |
" <td> 2.894120</td>\n", | |
" <td> NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td> NaN</td>\n", | |
" <td> NaN</td>\n", | |
" <td> NaN</td>\n", | |
" <td> NaN</td>\n", | |
" <td> NaN</td>\n", | |
" <td> 0.334221</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>7 rows \u00d7 6 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 172, | |
"text": [ | |
" random guess chance luck fate new\n", | |
"0 b 42.000000 -0.651568 -0.380875 -0.069093 NaN\n", | |
"1 b 42.000000 -0.250073 0.848358 1.767253 0.235703\n", | |
"2 b 42.000000 -0.224961 1.875583 4.462618 NaN\n", | |
"3 b 42.000000 -1.501159 -2.113281 -3.712812 NaN\n", | |
"4 a 42.000000 -0.028636 -1.440029 -2.881709 -1.211161\n", | |
"5 a -0.391263 0.969736 1.595289 2.894120 NaN\n", | |
"9 NaN NaN NaN NaN NaN 0.334221\n", | |
"\n", | |
"[7 rows x 6 columns]" | |
] | |
} | |
], | |
"prompt_number": 172 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df.fillna(df.mean())" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>random</th>\n", | |
" <th>guess</th>\n", | |
" <th>chance</th>\n", | |
" <th>luck</th>\n", | |
" <th>fate</th>\n", | |
" <th>new</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td> b</td>\n", | |
" <td> 42.000000</td>\n", | |
" <td>-0.651568</td>\n", | |
" <td>-0.380875</td>\n", | |
" <td>-0.069093</td>\n", | |
" <td>-0.213746</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td> b</td>\n", | |
" <td> 42.000000</td>\n", | |
" <td>-0.250073</td>\n", | |
" <td> 0.848358</td>\n", | |
" <td> 1.767253</td>\n", | |
" <td> 0.235703</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td> b</td>\n", | |
" <td> 42.000000</td>\n", | |
" <td>-0.224961</td>\n", | |
" <td> 1.875583</td>\n", | |
" <td> 4.462618</td>\n", | |
" <td>-0.213746</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td> b</td>\n", | |
" <td> 42.000000</td>\n", | |
" <td>-1.501159</td>\n", | |
" <td>-2.113281</td>\n", | |
" <td>-3.712812</td>\n", | |
" <td>-0.213746</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td> a</td>\n", | |
" <td> 42.000000</td>\n", | |
" <td>-0.028636</td>\n", | |
" <td>-1.440029</td>\n", | |
" <td>-2.881709</td>\n", | |
" <td>-1.211161</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td> a</td>\n", | |
" <td> -0.391263</td>\n", | |
" <td> 0.969736</td>\n", | |
" <td> 1.595289</td>\n", | |
" <td> 2.894120</td>\n", | |
" <td>-0.213746</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td> NaN</td>\n", | |
" <td> 34.934790</td>\n", | |
" <td>-0.281110</td>\n", | |
" <td> 0.064174</td>\n", | |
" <td> 0.410063</td>\n", | |
" <td> 0.334221</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>7 rows \u00d7 6 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 173, | |
"text": [ | |
" random guess chance luck fate new\n", | |
"0 b 42.000000 -0.651568 -0.380875 -0.069093 -0.213746\n", | |
"1 b 42.000000 -0.250073 0.848358 1.767253 0.235703\n", | |
"2 b 42.000000 -0.224961 1.875583 4.462618 -0.213746\n", | |
"3 b 42.000000 -1.501159 -2.113281 -3.712812 -0.213746\n", | |
"4 a 42.000000 -0.028636 -1.440029 -2.881709 -1.211161\n", | |
"5 a -0.391263 0.969736 1.595289 2.894120 -0.213746\n", | |
"9 NaN 34.934790 -0.281110 0.064174 0.410063 0.334221\n", | |
"\n", | |
"[7 rows x 6 columns]" | |
] | |
} | |
], | |
"prompt_number": 173 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Data summary\n", | |
"\n", | |
"* head, describe, hist, value_counts" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df[\"random\"].value_counts()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 175, | |
"text": [ | |
"b 4\n", | |
"a 2\n", | |
"dtype: int64" | |
] | |
} | |
], | |
"prompt_number": 175 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df[\"guess\"].describe()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 176, | |
"text": [ | |
"count 6.000000\n", | |
"mean 34.934790\n", | |
"std 17.306161\n", | |
"min -0.391263\n", | |
"25% 42.000000\n", | |
"50% 42.000000\n", | |
"75% 42.000000\n", | |
"max 42.000000\n", | |
"Name: guess, dtype: float64" | |
] | |
} | |
], | |
"prompt_number": 176 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df[\"guess\"].hist()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 177, | |
"text": [ | |
"<matplotlib.axes.AxesSubplot at 0x1095997d0>" | |
] | |
}, | |
{ | |
"metadata": {}, | |
"output_type": "display_data", | |
"png": "iVBORw0KGgoAAAANSUhEUgAAAWwAAAEACAYAAACXqUyYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEvpJREFUeJzt3VtsVHW/xvFnoE0IQenGQKmUpKZAoVA6jQ1kG4ThUEqC\nnFIvMEoo4JXxQi71hitplQuEyBWJFkleICYaFaGBKCuRUxqRBgWyQaU7hQIbPHAuhXbtC6WlLmXW\nlFnzX/+/30/Siynj9Peg72N92vImfN/3BQCIvUGmDwAAhENhA4AlKGwAsASFDQCWoLABwBIUNgBY\nIi/Mk0pKSvTkk09q8ODBys/PV0tLS9R3AQD+IlRhJxIJeZ6nESNGRH0PAOAfhJ5E+PkaADArVGEn\nEgnNmzdP1dXV2rp1a9Q3AQD+RqhJ5NChQyoqKtKVK1dUU1OjiRMn6vnnn4/6NgDAQ0IVdlFRkSRp\n5MiRWrZsmVpaWvoV9pgxY9TR0RHNhQDgqNLSUv3444+hn592Erl9+7Zu3LghSbp165b27dunioqK\nfs/p6OiQ7/vOvq1bt874DWQjn+35/uBn+LYug+fKeMZM33766afQZS2F+Az78uXLWrZsmSTp/v37\nevnllzV//vyMPojt2traTJ8QGZezSeSzX5vpA2IlbWE/88wzam1tzcUtAIBH4CcdQ6ivrzd9QmRc\nziaRz371pg+IlYTfNy4N/EUSCWXhZQA4LJFI6MHWHNFHsK6HMu1OPsMOwfM80ydExuVsEvns55k+\nIFYobACwBJMIgJxgEgliEgEAR1HYIbi8E7qcTSKf/TzTB8QKhQ0AlmDDBpATbNhBbNgA4CgKOwSX\nd0KXs0nks59n+oBYobABwBJs2ABygg07iA0bABxFYYfg8k7ocjaJfPbzTB8QKxQ2AFiCDRtATrBh\nB7FhA4CjKOwQXN4JXc4mkc9+nukDYoXCBgBLsGEDyAk27CA2bABwFIUdgss7ocvZJPLZzzN9QKxQ\n2ABgCTZsADnBhh3Ehg0AjqKwQ3B5J3Q5m0Q++3mmD4gVChsALMGGDSAn2LCD2LABwFEUdggu74Qu\nZ5PIZz/P9AGxQmEDgCXYsAHkBBt2EBs2ADiKwg7B5Z3Q5WwS+eznmT4gVihsALBE6A27u7tb1dXV\nKi4u1hdffNH/RdiwAaTBhh0U2Ya9adMmlZeX//mbDgDItVCFff78ee3Zs0evvvqqdf8GywaXd0KX\ns0nks59n+oBYCVXYa9eu1YYNGzRoEJM3AJiSl+4Ju3fv1qhRo1RVVfXIf5vX19erpKREklRQUKBk\nMqlUKiWp77MAWx8/eF9c7snm41QqFat7yOduvj4PHqdCPE5l8HxFen82Hnuep6amJknq7ctMpP2i\n41tvvaXt27crLy9PnZ2dun79uurq6vTRRx/1vQhfdASQBl90DMr6Fx3Xr1+v9vZ2nTt3Tjt37tSc\nOXP6lfW/QfAzBHe4nE0in/080wfESsajNN8lAgBm8GeJAMgJJpEg/iwRAHAUhR2Cyzuhy9kk8tnP\nM31ArFDYAGAJNmwAOcGGHcSGDQCOorBDcHkndDmbRD77eaYPiBUKGwAswYYNICfYsIPYsAHAURR2\nCC7vhC5nk8hnP8/0AbFCYQOAJdiwAeQEG3YQGzYAOIrCDsHlndDlbBL57OeZPiBWKGwAsAQbNoCc\nYMMOYsMGAEdR2CG4vBO6nE0in/080wfECoUNAJZgwwaQE2zYQWzYAOAoCjsEl3dCl7NJ5LOfZ/qA\nWKGwAcASbNgAcoINO4gNGwAcRWGH4PJO6HI2iXz280wfECsUNgBYgg0bQE6wYQexYQOAoyjsEFze\nCV3OJpHPfp7pA2KFwgYAS7BhA8gJNuwgNmwAcBSFHYLLO6HL2STy2c8zfUCsUNgAYIm0G3ZnZ6dm\nzZqlu3fvqqurS0uWLFFDQ0P/F2HDBpAGG3ZQpt2Zl+4JQ4YM0YEDBzR06FDdv39fM2bM0MGDBzVj\nxozHOhQAkJlQk8jQoUMlSV1dXeru7taIESMiPSpuXN4JXc4mkc9+nukDYiVUYff09CiZTKqwsFCz\nZ89WeXl51HcBAP4io+/Dvnbtmmpra9XY2KhUKtX3ImzYANJgww7K+ob9sOHDh2vhwoX69ttv+xW2\nJNXX16ukpESSVFBQoGQy2fucB//ZxmMe8/jf+7jPg8epLD9WpPdn47HneWpqapKk3r7MRNrPsK9e\nvaq8vDwVFBTozp07qq2t1bp16zR37ty+F3H8M2zP83p/813jcjaJfHEysM+wPfUVc9qPYF0PZf0z\n7IsXL2rlypXq6elRT0+PVqxY0a+sAQC5wZ8lAiAn2LCD+LNEAMBRFHYIwS+auMPlbBL57OeZPiBW\nKGwAsAQbNoCcYMMOYsMGAEdR2CG4vBO6nE0in/080wfECoUNAJZgwwaQE2zYQWzYAOAoCjsEl3dC\nl7NJ5LOfZ/qAWKGwAcASbNgAcoINO4gNGwAcRWGH4PJO6HI2iXz280wfECsUNgBYgg0bQE6wYQex\nYQOAoyjsEFzeCV3OJpHPfp7pA2KFwgYAS7BhA8gJNuwgNmwAcBSFHYLLO6HL2STy2c8zfUCsUNgA\nYAk2bAA5wYYdxIYNAI6isENweSd0OZtEPvt5pg+IFQobACzBhg0gJ9iwg9iwAcBRFHYILu+ELmeT\nyGc/z/QBsUJhA4Al2LAB5AQbdhAbNgA4isIOweWd0OVsEvns55k+IFYobACwBBs2gJxgww7K+obd\n3t6u2bNna/LkyZoyZYo2b978WAcCAAYmbWHn5+dr48aNOnnypI4ePaotW7bo9OnTubgtNlzeCV3O\nJpHPfp7pA2IlbWGPHj1ayWRSkjRs2DBNmjRJHR0dkR8GAOgvow27ra1Ns2bN0smTJzVs2LC+F2HD\nBpAGG3ZQZN+HffPmTb344ovatGlTv7IGAORGXpgn3bt3T3V1dXrllVe0dOnSv31OfX29SkpKJEkF\nBQVKJpNKpVKS+nY2Wx+/9957TuV5+PHDG2gc7iGfu/n6PHicCvH44b823fMV6f3Z+vvV1NQkSb19\nmYm0k4jv+1q5cqWeeuopbdy48e9fxPFJxPO83t9817icTSJfnAxsEvHUV8xpP4J1PZRpd6Yt7IMH\nD2rmzJmaOnXqn7/hUkNDgxYsWDDgDwrg34cNOyjrhR3FBwXw70NhB/GHP0UguMG5w+VsEvns55k+\nIFYobACwBJMIgJxgEgliEgEAR1HYIbi8E7qcTSKf/TzTB8QKhQ0AlmDDBpATbNhBbNgA4CgKOwSX\nd0KXs0nks59n+oBYobABwBJs2ABygg07iA0bABxFYYfg8k7ocjaJfPbzTB8QKxQ2AFiCDRtATrBh\nB7FhA4CjKOwQXN4JXc4mkc9+nukDYoXCBgBLsGEDyAk27CA2bABwFIUdgss7ocvZJPLZzzN9QKxQ\n2ABgCTZsADnBhh3Ehg0AjqKwQ3B5J3Q5m0Q++3mmD4gVChsALMGGDSAn2LCD2LABwFEUdggu74Qu\nZ5PIZz/P9AGxQmEDgCXYsAHkBBt2EBs2ADiKwg7B5Z3Q5WwS+eznmT4gVihsALAEGzaAnGDDDmLD\nBgBHhSrs1atXq7CwUBUVFVHfE0su74QuZ5PIZz/P9AGxEqqwV61apebm5qhvAQA8QugNu62tTYsW\nLdL3338ffBE2bABpsGEHZdqdeRHeYrUvv/xSN2/ejOz1n332WY0bNy6y1wfgnqwVdn19vUpKSiRJ\nBQUFSiaTSqVSkvp2Nlse7927V4sWLdawYS9Kku7e/R8NHvxfyssbJUm6f///JGnAjzs7WzVr1njt\n37/beN6HN9C4/P6Tz818fR48ToV4/PBfm+75ivT+bP39ampqkqTevswEk8jfuHHjhp566mndu3fj\nz/d46vuHIxu26qWXWvSf/2zN4msOjOd5vf9guYh88TGwScRT+P/t2ddDfFtfJFKmD4iMLf9jHyjy\n2S5l+oBYCVXYL730kp577jmdOXNGY8eO1Ycffhj1XQCAvwhV2Dt27FBHR4fu3r2r9vZ2rVq1Kuq7\nYsYzfUBkgvuiW8hnO8/0AbHCJAIAlqCwQ0mZPiAyrm+g5LNdyvQBsUJhA4AlKOxQPNMHRMb1DZR8\ntvNMHxArFDYAWILCDiVl+oDIuL6Bks92KdMHxAqFDQCWoLBD8UwfEBnXN1Dy2c4zfUCsUNgAYAkK\nO5SU6QMi4/oGSj7bpUwfECsUNgBYgsIOxTN9QGRc30DJZzvP9AGxQmEDgCUo7FBSpg+IjOsbKPls\nlzJ9QKxQ2ABgCQo7FM/0AZFxfQMln+080wfECoUNAJagsENJmT4gMq5voOSzXcr0AbFCYQOAJSjs\nUDzTB0TG9Q2UfLbzTB8QKxQ2AFiCwg4lZfqAyLi+gZLPdinTB8QKhQ0AlqCwQ/FMHxAZ1zdQ8tnO\nM31ArFDYAGAJCjuUlOkDIuP6Bko+26VMHxArFDYAWILCDsUzfUBkXN9AyWc7z/QBsUJhA4AlKOxQ\nUqYPiIzrGyj5bJcyfUCsUNgAYAkKOxTP9AGRcX0DJZ/tPNMHxAqFDQCWoLBDSZk+IDKub6Dks13K\n9AGxQmEDgCVCFXZzc7MmTpyo8ePH65133on6phjyTB8QGdc3UPLZzjN9QKykLezu7m69/vrram5u\n1qlTp7Rjxw6dPn06F7fFSKvpAyLT2upuNol89nM9X2bSFnZLS4vGjRunkpIS5efna/ny5frss89y\ncVuM/G76gMj8/ru72STy2c/1fJlJW9gXLlzQ2LFjex8XFxfrwoULkR4FAAjKS/eERCKRiztiJZFI\nqLv7jp58cpEk6fbt4xo69FjWXr+r63+Vl/ffWXu9x9HW1mb6hEiRz3Ztpg+IFz+NI0eO+LW1tb2P\n169f7zc2NvZ7TmlpqS+JN9544423DN5KS0vTVXA/Cd/3fT3C/fv3VVZWpq+++kpPP/20pk2bph07\ndmjSpEmP+ssAAFmWdhLJy8vT+++/r9raWnV3d2vNmjWUNQAYkPYzbABAPAz4Jx0//vhjTZ48WYMH\nD9Z3333X79caGho0fvx4TZw4Ufv27XvsI01x7QeGVq9ercLCQlVUVPS+79dff1VNTY0mTJig+fPn\nW/1tYu3t7Zo9e7YmT56sKVOmaPPmzZLcyNjZ2anp06crmUyqvLxcb775piQ3sj2su7tbVVVVWrTo\njy/4u5SvpKREU6dOVVVVlaZNmyYp83wDLuyKigp9+umnmjlzZr/3nzp1Srt27dKpU6fU3Nys1157\nTT09PQP9MMa4+ANDq1atUnNzc7/3NTY2qqamRmfOnNHcuXPV2Nho6LrHl5+fr40bN+rkyZM6evSo\ntmzZotOnTzuRcciQITpw4IBaW1t14sQJHThwQAcPHnQi28M2bdqk8vLy3u9OcylfIpGQ53k6fvy4\nWlpaJA0gX0ZfovwbqVTKP3bs2D9+F0ltba1/5MiRx/0wOXf48OF+3x3T0NDgNzQ0GLwoO86dO+dP\nmTKl93FZWZl/6dIl3/d9/+LFi35ZWZmp07JuyZIl/v79+53LeOvWLb+6utr/4YcfnMrW3t7uz507\n1//666/9F154wfd9t/75LCkp8a9evdrvfZnmy/of/tTR0aHi4uLex7b+oM2/5QeGLl++rMLCQklS\nYWGhLl++bPii7Ghra9Px48c1ffp0ZzL29PQomUyqsLCwd/pxJZskrV27Vhs2bNCgQX215FK+RCKh\nefPmqbq6Wlu3bpWUeb5HfpdITU2NLl26FHj/+vXrezemsIfaxsabH1cikXAi982bN1VXV6dNmzbp\niSee6PdrNmccNGiQWltbde3aNdXW1urAgQP9ft3mbLt379aoUaNUVVX1j3+glc35JOnQoUMqKirS\nlStXVFNTo4kTJ/b79TD5HlnY+/fvz/ioMWPGqL29vffx+fPnNWbMmIxfx7S/5mhvb+/3Xw6uKCws\n1KVLlzR69GhdvHhRo0aNMn3SY7l3757q6uq0YsUKLV26VJJ7GYcPH66FCxfq2LFjzmQ7fPiwPv/8\nc+3Zs0ednZ26fv26VqxY4Uw+SSoqKpIkjRw5UsuWLVNLS0vG+bIyifgPfWfg4sWLtXPnTnV1denc\nuXM6e/Zs71dEbVJdXa2zZ8+qra1NXV1d2rVrlxYvXmz6rKxbvHixtm3bJknatm1bb8nZyPd9rVmz\nRuXl5XrjjTd63+9CxqtXr/Z+B8GdO3e0f/9+VVVVOZFN+uO/2tvb23Xu3Dnt3LlTc+bM0fbt253J\nd/v2bd24cUOSdOvWLe3bt08VFRWZ5xvogP7JJ5/4xcXF/pAhQ/zCwkJ/wYIFvb/29ttv+6WlpX5Z\nWZnf3Nw80A9h3J49e/wJEyb4paWl/vr1602f89iWL1/uFxUV+fn5+X5xcbH/wQcf+L/88os/d+5c\nf/z48X5NTY3/22+/mT5zwL755hs/kUj4lZWVfjKZ9JPJpL93714nMp44ccKvqqryKysr/YqKCv/d\nd9/1fd93IttfeZ7nL1q0yPd9d/L9/PPPfmVlpV9ZWelPnjy5t08yzccPzgCAJfi/CAMAS1DYAGAJ\nChsALEFhA4AlKGwAsASFDQCWoLABwBIUNgBY4v8BKJfJfpH1rU8AAAAASUVORK5CYII=\n", | |
"text": [ | |
"<matplotlib.figure.Figure at 0x109599e50>" | |
] | |
} | |
], | |
"prompt_number": 177 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Reading and writing\n", | |
"\n", | |
"`csv`, `excel`, `hdf`, `sql`, `json`, `html`, `stata`, `clipboard`, `pickle`,\n", | |
"and experimental (as of Pandas 0.13.1): `msgpack`, `gbq`.\n", | |
"\n", | |
"And of course typical Python objects, like `list`, `dict` or `numpy.array`.\n", | |
"\n", | |
"http://pandas.pydata.org/pandas-docs/stable/io.html" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df = pd.read_csv(\"file.csv\",\n", | |
" encoding='utf8')" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Comments\n", | |
"\n", | |
"When importing data, be careful to remember to distinguish `str`/`uniode` from `int`.\n", | |
"\n", | |
"E.g. `JSON` format can have only strings as keys, and it is easy to get nasty bug.\n", | |
"\n", | |
"Big data warning: if it fits your memory, you are better off doing it locally that doing" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## by Piotr Migda\u0142\n", | |
"\n", | |
"pmigdal@gmail.com, http://migdal.wikidot.com\n", | |
"\n", | |
"I freelance in data analysis and interactive visualization." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [] | |
} | |
], | |
"metadata": {} | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment