Skip to content

Instantly share code, notes, and snippets.

@stared
Created April 24, 2014 16:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save stared/11260701 to your computer and use it in GitHub Desktop.
Save stared/11260701 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "",
"signature": "sha256:9172d64435de94c4a27d90016e6b1e8a1a05f65a6f34fc226525e79ac31c7c90"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction to Pandas \n",
"\n",
"* Piotr Migda\u0142, http://migdal.wikidot.com\n",
"* The Barcelona Python Meetup Group: [Python & Sciences](http://www.meetup.com/python-185/events/169870182/) (24 April 2014)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# [Pandas](http://pandas.pydata.org/)\n",
"\n",
"* 0.13.1 released (February 3, 2014)\n",
"\n",
" pandas is an open source, BSD-licensed library\n",
" providing high-performance, easy-to-use data structures\n",
" and data analysis tools for the Python programming language\n",
" \n",
"In practice, Pandas brings **R**-like data structures, great for working with **tabular data**. Tabular?\n",
"\n",
"* Everything that goes into **SQL**, **Excel** or **CSV**.\n",
"* Not all data, but data you typically work with.\n",
"\n",
"Both for data exploration and production.\n",
"\n",
"If you can do it with SQL or Excel, you can do it with Pandas!\n",
"\n",
"(Counterexamples?)\n",
"\n",
"So:\n",
"\n",
" $ pip install pandas"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas as pd"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 141
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Series\n",
"\n",
"* Creating\n",
"* Adding\n",
"* Concatenating\n",
"* Filtering\n",
"* Applying"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser = pd.Series(['a','b','c', 'd', 'e'])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 142
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 143,
"text": [
"0 a\n",
"1 b\n",
"2 c\n",
"3 d\n",
"4 e\n",
"dtype: object"
]
}
],
"prompt_number": 143
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser[1]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 144,
"text": [
"'b'"
]
}
],
"prompt_number": 144
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser[9] = 'qqq'"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 145
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 146,
"text": [
"0 a\n",
"1 b\n",
"2 c\n",
"3 d\n",
"4 e\n",
"9 qqq\n",
"dtype: object"
]
}
],
"prompt_number": 146
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser['ten'] = 'abc'"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 147
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 148,
"text": [
"0 a\n",
"1 b\n",
"2 c\n",
"3 d\n",
"4 e\n",
"9 qqq\n",
"ten abc\n",
"dtype: object"
]
}
],
"prompt_number": 148
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# ser.loc[k] accesses object with index k (can be integer, string or timestamp)\n",
"ser.loc['ten']"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 149,
"text": [
"'abc'"
]
}
],
"prompt_number": 149
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# ser.iloc[i] - accessed i-th object; i in range(0, len(ser))\n",
"ser.iloc[4]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 150,
"text": [
"'e'"
]
}
],
"prompt_number": 150
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser != 'b'"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 151,
"text": [
"0 True\n",
"1 False\n",
"2 True\n",
"3 True\n",
"4 True\n",
"9 True\n",
"ten True\n",
"dtype: bool"
]
}
],
"prompt_number": 151
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser[ser != 'b']"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 152,
"text": [
"0 a\n",
"2 c\n",
"3 d\n",
"4 e\n",
"9 qqq\n",
"ten abc\n",
"dtype: object"
]
}
],
"prompt_number": 152
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser = ser.append(pd.Series([8,2,1],\n",
" index=[10,100,1000]))"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 153
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 154,
"text": [
"0 a\n",
"1 b\n",
"2 c\n",
"3 d\n",
"4 e\n",
"9 qqq\n",
"ten abc\n",
"10 8\n",
"100 2\n",
"1000 1\n",
"dtype: object"
]
}
],
"prompt_number": 154
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser.describe()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 155,
"text": [
"count 10\n",
"unique 10\n",
"top a\n",
"freq 1\n",
"dtype: object"
]
}
],
"prompt_number": 155
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data frame\n",
"\n",
"* Creating\n",
"* Relations to Series; columns and index\n",
"* columns, loc, iloc, ix\n",
"* Iterating\n",
"* Filtering\n",
"* Joins\n",
"* Groupby\n",
"* Apply, axes\n",
"* Renaming things\n",
"* Dealings with NaNs\n",
"* Data types: (numpy) float, int, obj"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 156
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df = pd.DataFrame(np.random.randn(6,4),\n",
" columns=[\"random\", \"guess\", \"chance\", \"luck\"])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 157
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>random</th>\n",
" <th>guess</th>\n",
" <th>chance</th>\n",
" <th>luck</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>-0.692656</td>\n",
" <td> 0.437082</td>\n",
" <td>-0.651568</td>\n",
" <td>-0.380875</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>-0.070536</td>\n",
" <td>-1.224517</td>\n",
" <td>-0.250073</td>\n",
" <td> 0.848358</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>-0.711452</td>\n",
" <td> 0.510971</td>\n",
" <td>-0.224961</td>\n",
" <td> 1.875583</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>-0.513751</td>\n",
" <td>-2.484004</td>\n",
" <td>-1.501159</td>\n",
" <td>-2.113281</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 0.001650</td>\n",
" <td> 1.963386</td>\n",
" <td>-0.028636</td>\n",
" <td>-1.440029</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> 0.296459</td>\n",
" <td>-0.391263</td>\n",
" <td> 0.969736</td>\n",
" <td> 1.595289</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>6 rows \u00d7 4 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 158,
"text": [
" random guess chance luck\n",
"0 -0.692656 0.437082 -0.651568 -0.380875\n",
"1 -0.070536 -1.224517 -0.250073 0.848358\n",
"2 -0.711452 0.510971 -0.224961 1.875583\n",
"3 -0.513751 -2.484004 -1.501159 -2.113281\n",
"4 0.001650 1.963386 -0.028636 -1.440029\n",
"5 0.296459 -0.391263 0.969736 1.595289\n",
"\n",
"[6 rows x 4 columns]"
]
}
],
"prompt_number": 158
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[\"luck\"]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 159,
"text": [
"0 -0.380875\n",
"1 0.848358\n",
"2 1.875583\n",
"3 -2.113281\n",
"4 -1.440029\n",
"5 1.595289\n",
"Name: luck, dtype: float64"
]
}
],
"prompt_number": 159
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[\"luck\"] + 1"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 160,
"text": [
"0 0.619125\n",
"1 1.848358\n",
"2 2.875583\n",
"3 -1.113281\n",
"4 -0.440029\n",
"5 2.595289\n",
"Name: luck, dtype: float64"
]
}
],
"prompt_number": 160
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[\"fate\"] = 2 * df[\"luck\"] - df[\"random\"]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 161
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>random</th>\n",
" <th>guess</th>\n",
" <th>chance</th>\n",
" <th>luck</th>\n",
" <th>fate</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>-0.692656</td>\n",
" <td> 0.437082</td>\n",
" <td>-0.651568</td>\n",
" <td>-0.380875</td>\n",
" <td>-0.069093</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>-0.070536</td>\n",
" <td>-1.224517</td>\n",
" <td>-0.250073</td>\n",
" <td> 0.848358</td>\n",
" <td> 1.767253</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>-0.711452</td>\n",
" <td> 0.510971</td>\n",
" <td>-0.224961</td>\n",
" <td> 1.875583</td>\n",
" <td> 4.462618</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>-0.513751</td>\n",
" <td>-2.484004</td>\n",
" <td>-1.501159</td>\n",
" <td>-2.113281</td>\n",
" <td>-3.712812</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 0.001650</td>\n",
" <td> 1.963386</td>\n",
" <td>-0.028636</td>\n",
" <td>-1.440029</td>\n",
" <td>-2.881709</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> 0.296459</td>\n",
" <td>-0.391263</td>\n",
" <td> 0.969736</td>\n",
" <td> 1.595289</td>\n",
" <td> 2.894120</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>6 rows \u00d7 5 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 162,
"text": [
" random guess chance luck fate\n",
"0 -0.692656 0.437082 -0.651568 -0.380875 -0.069093\n",
"1 -0.070536 -1.224517 -0.250073 0.848358 1.767253\n",
"2 -0.711452 0.510971 -0.224961 1.875583 4.462618\n",
"3 -0.513751 -2.484004 -1.501159 -2.113281 -3.712812\n",
"4 0.001650 1.963386 -0.028636 -1.440029 -2.881709\n",
"5 0.296459 -0.391263 0.969736 1.595289 2.894120\n",
"\n",
"[6 rows x 5 columns]"
]
}
],
"prompt_number": 162
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[\"random\"] = df[\"random\"].apply(lambda x: 'a' if x > 0 else 'b')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 163
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>random</th>\n",
" <th>guess</th>\n",
" <th>chance</th>\n",
" <th>luck</th>\n",
" <th>fate</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> b</td>\n",
" <td> 0.437082</td>\n",
" <td>-0.651568</td>\n",
" <td>-0.380875</td>\n",
" <td>-0.069093</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> b</td>\n",
" <td>-1.224517</td>\n",
" <td>-0.250073</td>\n",
" <td> 0.848358</td>\n",
" <td> 1.767253</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> b</td>\n",
" <td> 0.510971</td>\n",
" <td>-0.224961</td>\n",
" <td> 1.875583</td>\n",
" <td> 4.462618</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> b</td>\n",
" <td>-2.484004</td>\n",
" <td>-1.501159</td>\n",
" <td>-2.113281</td>\n",
" <td>-3.712812</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> a</td>\n",
" <td> 1.963386</td>\n",
" <td>-0.028636</td>\n",
" <td>-1.440029</td>\n",
" <td>-2.881709</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> a</td>\n",
" <td>-0.391263</td>\n",
" <td> 0.969736</td>\n",
" <td> 1.595289</td>\n",
" <td> 2.894120</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>6 rows \u00d7 5 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 164,
"text": [
" random guess chance luck fate\n",
"0 b 0.437082 -0.651568 -0.380875 -0.069093\n",
"1 b -1.224517 -0.250073 0.848358 1.767253\n",
"2 b 0.510971 -0.224961 1.875583 4.462618\n",
"3 b -2.484004 -1.501159 -2.113281 -3.712812\n",
"4 a 1.963386 -0.028636 -1.440029 -2.881709\n",
"5 a -0.391263 0.969736 1.595289 2.894120\n",
"\n",
"[6 rows x 5 columns]"
]
}
],
"prompt_number": 164
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df.ix[1:3, \"guess\":\"luck\"]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>guess</th>\n",
" <th>chance</th>\n",
" <th>luck</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>-1.224517</td>\n",
" <td>-0.250073</td>\n",
" <td> 0.848358</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 0.510971</td>\n",
" <td>-0.224961</td>\n",
" <td> 1.875583</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>-2.484004</td>\n",
" <td>-1.501159</td>\n",
" <td>-2.113281</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>3 rows \u00d7 3 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 165,
"text": [
" guess chance luck\n",
"1 -1.224517 -0.250073 0.848358\n",
"2 0.510971 -0.224961 1.875583\n",
"3 -2.484004 -1.501159 -2.113281\n",
"\n",
"[3 rows x 3 columns]"
]
}
],
"prompt_number": 165
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a_column = pd.DataFrame(np.random.randn(3,1),\n",
" index=[1,4,9],\n",
" columns=[\"new\"])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 166
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a_column"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>new</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 0.235703</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>-1.211161</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td> 0.334221</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>3 rows \u00d7 1 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 167,
"text": [
" new\n",
"1 0.235703\n",
"4 -1.211161\n",
"9 0.334221\n",
"\n",
"[3 rows x 1 columns]"
]
}
],
"prompt_number": 167
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df = pd.concat([df, a_column], axis=1)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 168
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>random</th>\n",
" <th>guess</th>\n",
" <th>chance</th>\n",
" <th>luck</th>\n",
" <th>fate</th>\n",
" <th>new</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> b</td>\n",
" <td> 0.437082</td>\n",
" <td>-0.651568</td>\n",
" <td>-0.380875</td>\n",
" <td>-0.069093</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> b</td>\n",
" <td>-1.224517</td>\n",
" <td>-0.250073</td>\n",
" <td> 0.848358</td>\n",
" <td> 1.767253</td>\n",
" <td> 0.235703</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> b</td>\n",
" <td> 0.510971</td>\n",
" <td>-0.224961</td>\n",
" <td> 1.875583</td>\n",
" <td> 4.462618</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> b</td>\n",
" <td>-2.484004</td>\n",
" <td>-1.501159</td>\n",
" <td>-2.113281</td>\n",
" <td>-3.712812</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> a</td>\n",
" <td> 1.963386</td>\n",
" <td>-0.028636</td>\n",
" <td>-1.440029</td>\n",
" <td>-2.881709</td>\n",
" <td>-1.211161</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> a</td>\n",
" <td>-0.391263</td>\n",
" <td> 0.969736</td>\n",
" <td> 1.595289</td>\n",
" <td> 2.894120</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 0.334221</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>7 rows \u00d7 6 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 169,
"text": [
" random guess chance luck fate new\n",
"0 b 0.437082 -0.651568 -0.380875 -0.069093 NaN\n",
"1 b -1.224517 -0.250073 0.848358 1.767253 0.235703\n",
"2 b 0.510971 -0.224961 1.875583 4.462618 NaN\n",
"3 b -2.484004 -1.501159 -2.113281 -3.712812 NaN\n",
"4 a 1.963386 -0.028636 -1.440029 -2.881709 -1.211161\n",
"5 a -0.391263 0.969736 1.595289 2.894120 NaN\n",
"9 NaN NaN NaN NaN NaN 0.334221\n",
"\n",
"[7 rows x 6 columns]"
]
}
],
"prompt_number": 169
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[df[\"chance\"] < 0]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>random</th>\n",
" <th>guess</th>\n",
" <th>chance</th>\n",
" <th>luck</th>\n",
" <th>fate</th>\n",
" <th>new</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> b</td>\n",
" <td> 0.437082</td>\n",
" <td>-0.651568</td>\n",
" <td>-0.380875</td>\n",
" <td>-0.069093</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> b</td>\n",
" <td>-1.224517</td>\n",
" <td>-0.250073</td>\n",
" <td> 0.848358</td>\n",
" <td> 1.767253</td>\n",
" <td> 0.235703</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> b</td>\n",
" <td> 0.510971</td>\n",
" <td>-0.224961</td>\n",
" <td> 1.875583</td>\n",
" <td> 4.462618</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> b</td>\n",
" <td>-2.484004</td>\n",
" <td>-1.501159</td>\n",
" <td>-2.113281</td>\n",
" <td>-3.712812</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> a</td>\n",
" <td> 1.963386</td>\n",
" <td>-0.028636</td>\n",
" <td>-1.440029</td>\n",
" <td>-2.881709</td>\n",
" <td>-1.211161</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows \u00d7 6 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 170,
"text": [
" random guess chance luck fate new\n",
"0 b 0.437082 -0.651568 -0.380875 -0.069093 NaN\n",
"1 b -1.224517 -0.250073 0.848358 1.767253 0.235703\n",
"2 b 0.510971 -0.224961 1.875583 4.462618 NaN\n",
"3 b -2.484004 -1.501159 -2.113281 -3.712812 NaN\n",
"4 a 1.963386 -0.028636 -1.440029 -2.881709 -1.211161\n",
"\n",
"[5 rows x 6 columns]"
]
}
],
"prompt_number": 170
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[\"guess\"][df[\"chance\"] < 0] = 42"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 171
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>random</th>\n",
" <th>guess</th>\n",
" <th>chance</th>\n",
" <th>luck</th>\n",
" <th>fate</th>\n",
" <th>new</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> b</td>\n",
" <td> 42.000000</td>\n",
" <td>-0.651568</td>\n",
" <td>-0.380875</td>\n",
" <td>-0.069093</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> b</td>\n",
" <td> 42.000000</td>\n",
" <td>-0.250073</td>\n",
" <td> 0.848358</td>\n",
" <td> 1.767253</td>\n",
" <td> 0.235703</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> b</td>\n",
" <td> 42.000000</td>\n",
" <td>-0.224961</td>\n",
" <td> 1.875583</td>\n",
" <td> 4.462618</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> b</td>\n",
" <td> 42.000000</td>\n",
" <td>-1.501159</td>\n",
" <td>-2.113281</td>\n",
" <td>-3.712812</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> a</td>\n",
" <td> 42.000000</td>\n",
" <td>-0.028636</td>\n",
" <td>-1.440029</td>\n",
" <td>-2.881709</td>\n",
" <td>-1.211161</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> a</td>\n",
" <td> -0.391263</td>\n",
" <td> 0.969736</td>\n",
" <td> 1.595289</td>\n",
" <td> 2.894120</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 0.334221</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>7 rows \u00d7 6 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 172,
"text": [
" random guess chance luck fate new\n",
"0 b 42.000000 -0.651568 -0.380875 -0.069093 NaN\n",
"1 b 42.000000 -0.250073 0.848358 1.767253 0.235703\n",
"2 b 42.000000 -0.224961 1.875583 4.462618 NaN\n",
"3 b 42.000000 -1.501159 -2.113281 -3.712812 NaN\n",
"4 a 42.000000 -0.028636 -1.440029 -2.881709 -1.211161\n",
"5 a -0.391263 0.969736 1.595289 2.894120 NaN\n",
"9 NaN NaN NaN NaN NaN 0.334221\n",
"\n",
"[7 rows x 6 columns]"
]
}
],
"prompt_number": 172
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df.fillna(df.mean())"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>random</th>\n",
" <th>guess</th>\n",
" <th>chance</th>\n",
" <th>luck</th>\n",
" <th>fate</th>\n",
" <th>new</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> b</td>\n",
" <td> 42.000000</td>\n",
" <td>-0.651568</td>\n",
" <td>-0.380875</td>\n",
" <td>-0.069093</td>\n",
" <td>-0.213746</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> b</td>\n",
" <td> 42.000000</td>\n",
" <td>-0.250073</td>\n",
" <td> 0.848358</td>\n",
" <td> 1.767253</td>\n",
" <td> 0.235703</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> b</td>\n",
" <td> 42.000000</td>\n",
" <td>-0.224961</td>\n",
" <td> 1.875583</td>\n",
" <td> 4.462618</td>\n",
" <td>-0.213746</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> b</td>\n",
" <td> 42.000000</td>\n",
" <td>-1.501159</td>\n",
" <td>-2.113281</td>\n",
" <td>-3.712812</td>\n",
" <td>-0.213746</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> a</td>\n",
" <td> 42.000000</td>\n",
" <td>-0.028636</td>\n",
" <td>-1.440029</td>\n",
" <td>-2.881709</td>\n",
" <td>-1.211161</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> a</td>\n",
" <td> -0.391263</td>\n",
" <td> 0.969736</td>\n",
" <td> 1.595289</td>\n",
" <td> 2.894120</td>\n",
" <td>-0.213746</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td> NaN</td>\n",
" <td> 34.934790</td>\n",
" <td>-0.281110</td>\n",
" <td> 0.064174</td>\n",
" <td> 0.410063</td>\n",
" <td> 0.334221</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>7 rows \u00d7 6 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 173,
"text": [
" random guess chance luck fate new\n",
"0 b 42.000000 -0.651568 -0.380875 -0.069093 -0.213746\n",
"1 b 42.000000 -0.250073 0.848358 1.767253 0.235703\n",
"2 b 42.000000 -0.224961 1.875583 4.462618 -0.213746\n",
"3 b 42.000000 -1.501159 -2.113281 -3.712812 -0.213746\n",
"4 a 42.000000 -0.028636 -1.440029 -2.881709 -1.211161\n",
"5 a -0.391263 0.969736 1.595289 2.894120 -0.213746\n",
"9 NaN 34.934790 -0.281110 0.064174 0.410063 0.334221\n",
"\n",
"[7 rows x 6 columns]"
]
}
],
"prompt_number": 173
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data summary\n",
"\n",
"* head, describe, hist, value_counts"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[\"random\"].value_counts()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 175,
"text": [
"b 4\n",
"a 2\n",
"dtype: int64"
]
}
],
"prompt_number": 175
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[\"guess\"].describe()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 176,
"text": [
"count 6.000000\n",
"mean 34.934790\n",
"std 17.306161\n",
"min -0.391263\n",
"25% 42.000000\n",
"50% 42.000000\n",
"75% 42.000000\n",
"max 42.000000\n",
"Name: guess, dtype: float64"
]
}
],
"prompt_number": 176
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[\"guess\"].hist()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 177,
"text": [
"<matplotlib.axes.AxesSubplot at 0x1095997d0>"
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAWwAAAEACAYAAACXqUyYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEvpJREFUeJzt3VtsVHW/xvFnoE0IQenGQKmUpKZAoVA6jQ1kG4ThUEqC\nnFIvMEoo4JXxQi71hitplQuEyBWJFkleICYaFaGBKCuRUxqRBgWyQaU7hQIbPHAuhXbtC6WlLmXW\nlFnzX/+/30/Siynj9Peg72N92vImfN/3BQCIvUGmDwAAhENhA4AlKGwAsASFDQCWoLABwBIUNgBY\nIi/Mk0pKSvTkk09q8ODBys/PV0tLS9R3AQD+IlRhJxIJeZ6nESNGRH0PAOAfhJ5E+PkaADArVGEn\nEgnNmzdP1dXV2rp1a9Q3AQD+RqhJ5NChQyoqKtKVK1dUU1OjiRMn6vnnn4/6NgDAQ0IVdlFRkSRp\n5MiRWrZsmVpaWvoV9pgxY9TR0RHNhQDgqNLSUv3444+hn592Erl9+7Zu3LghSbp165b27dunioqK\nfs/p6OiQ7/vOvq1bt874DWQjn+35/uBn+LYug+fKeMZM33766afQZS2F+Az78uXLWrZsmSTp/v37\nevnllzV//vyMPojt2traTJ8QGZezSeSzX5vpA2IlbWE/88wzam1tzcUtAIBH4CcdQ6ivrzd9QmRc\nziaRz371pg+IlYTfNy4N/EUSCWXhZQA4LJFI6MHWHNFHsK6HMu1OPsMOwfM80ydExuVsEvns55k+\nIFYobACwBJMIgJxgEgliEgEAR1HYIbi8E7qcTSKf/TzTB8QKhQ0AlmDDBpATbNhBbNgA4CgKOwSX\nd0KXs0nks59n+oBYobABwBJs2ABygg07iA0bABxFYYfg8k7ocjaJfPbzTB8QKxQ2AFiCDRtATrBh\nB7FhA4CjKOwQXN4JXc4mkc9+nukDYoXCBgBLsGEDyAk27CA2bABwFIUdgss7ocvZJPLZzzN9QKxQ\n2ABgCTZsADnBhh3Ehg0AjqKwQ3B5J3Q5m0Q++3mmD4gVChsALMGGDSAn2LCD2LABwFEUdggu74Qu\nZ5PIZz/P9AGxQmEDgCXYsAHkBBt2EBs2ADiKwg7B5Z3Q5WwS+eznmT4gVihsALBE6A27u7tb1dXV\nKi4u1hdffNH/RdiwAaTBhh0U2Ya9adMmlZeX//mbDgDItVCFff78ee3Zs0evvvqqdf8GywaXd0KX\ns0nks59n+oBYCVXYa9eu1YYNGzRoEJM3AJiSl+4Ju3fv1qhRo1RVVfXIf5vX19erpKREklRQUKBk\nMqlUKiWp77MAWx8/eF9c7snm41QqFat7yOduvj4PHqdCPE5l8HxFen82Hnuep6amJknq7ctMpP2i\n41tvvaXt27crLy9PnZ2dun79uurq6vTRRx/1vQhfdASQBl90DMr6Fx3Xr1+v9vZ2nTt3Tjt37tSc\nOXP6lfW/QfAzBHe4nE0in/080wfESsajNN8lAgBm8GeJAMgJJpEg/iwRAHAUhR2Cyzuhy9kk8tnP\nM31ArFDYAGAJNmwAOcGGHcSGDQCOorBDcHkndDmbRD77eaYPiBUKGwAswYYNICfYsIPYsAHAURR2\nCC7vhC5nk8hnP8/0AbFCYQOAJdiwAeQEG3YQGzYAOIrCDsHlndDlbBL57OeZPiBWKGwAsAQbNoCc\nYMMOYsMGAEdR2CG4vBO6nE0in/080wfECoUNAJZgwwaQE2zYQWzYAOAoCjsEl3dCl7NJ5LOfZ/qA\nWKGwAcASbNgAcoINO4gNGwAcRWGH4PJO6HI2iXz280wfECsUNgBYgg0bQE6wYQexYQOAoyjsEFze\nCV3OJpHPfp7pA2KFwgYAS7BhA8gJNuwgNmwAcBSFHYLLO6HL2STy2c8zfUCsUNgAYIm0G3ZnZ6dm\nzZqlu3fvqqurS0uWLFFDQ0P/F2HDBpAGG3ZQpt2Zl+4JQ4YM0YEDBzR06FDdv39fM2bM0MGDBzVj\nxozHOhQAkJlQk8jQoUMlSV1dXeru7taIESMiPSpuXN4JXc4mkc9+nukDYiVUYff09CiZTKqwsFCz\nZ89WeXl51HcBAP4io+/Dvnbtmmpra9XY2KhUKtX3ImzYANJgww7K+ob9sOHDh2vhwoX69ttv+xW2\nJNXX16ukpESSVFBQoGQy2fucB//ZxmMe8/jf+7jPg8epLD9WpPdn47HneWpqapKk3r7MRNrPsK9e\nvaq8vDwVFBTozp07qq2t1bp16zR37ty+F3H8M2zP83p/813jcjaJfHEysM+wPfUVc9qPYF0PZf0z\n7IsXL2rlypXq6elRT0+PVqxY0a+sAQC5wZ8lAiAn2LCD+LNEAMBRFHYIwS+auMPlbBL57OeZPiBW\nKGwAsAQbNoCcYMMOYsMGAEdR2CG4vBO6nE0in/080wfECoUNAJZgwwaQE2zYQWzYAOAoCjsEl3dC\nl7NJ5LOfZ/qAWKGwAcASbNgAcoINO4gNGwAcRWGH4PJO6HI2iXz280wfECsUNgBYgg0bQE6wYQex\nYQOAoyjsEFzeCV3OJpHPfp7pA2KFwgYAS7BhA8gJNuwgNmwAcBSFHYLLO6HL2STy2c8zfUCsUNgA\nYAk2bAA5wYYdxIYNAI6isENweSd0OZtEPvt5pg+IFQobACzBhg0gJ9iwg9iwAcBRFHYILu+ELmeT\nyGc/z/QBsUJhA4Al2LAB5AQbdhAbNgA4isIOweWd0OVsEvns55k+IFYobACwBBs2gJxgww7K+obd\n3t6u2bNna/LkyZoyZYo2b978WAcCAAYmbWHn5+dr48aNOnnypI4ePaotW7bo9OnTubgtNlzeCV3O\nJpHPfp7pA2IlbWGPHj1ayWRSkjRs2DBNmjRJHR0dkR8GAOgvow27ra1Ns2bN0smTJzVs2LC+F2HD\nBpAGG3ZQZN+HffPmTb344ovatGlTv7IGAORGXpgn3bt3T3V1dXrllVe0dOnSv31OfX29SkpKJEkF\nBQVKJpNKpVKS+nY2Wx+/9957TuV5+PHDG2gc7iGfu/n6PHicCvH44b823fMV6f3Z+vvV1NQkSb19\nmYm0k4jv+1q5cqWeeuopbdy48e9fxPFJxPO83t9817icTSJfnAxsEvHUV8xpP4J1PZRpd6Yt7IMH\nD2rmzJmaOnXqn7/hUkNDgxYsWDDgDwrg34cNOyjrhR3FBwXw70NhB/GHP0UguMG5w+VsEvns55k+\nIFYobACwBJMIgJxgEgliEgEAR1HYIbi8E7qcTSKf/TzTB8QKhQ0AlmDDBpATbNhBbNgA4CgKOwSX\nd0KXs0nks59n+oBYobABwBJs2ABygg07iA0bABxFYYfg8k7ocjaJfPbzTB8QKxQ2AFiCDRtATrBh\nB7FhA4CjKOwQXN4JXc4mkc9+nukDYoXCBgBLsGEDyAk27CA2bABwFIUdgss7ocvZJPLZzzN9QKxQ\n2ABgCTZsADnBhh3Ehg0AjqKwQ3B5J3Q5m0Q++3mmD4gVChsALMGGDSAn2LCD2LABwFEUdggu74Qu\nZ5PIZz/P9AGxQmEDgCXYsAHkBBt2EBs2ADiKwg7B5Z3Q5WwS+eznmT4gVihsALAEGzaAnGDDDmLD\nBgBHhSrs1atXq7CwUBUVFVHfE0su74QuZ5PIZz/P9AGxEqqwV61apebm5qhvAQA8QugNu62tTYsW\nLdL3338ffBE2bABpsGEHZdqdeRHeYrUvv/xSN2/ejOz1n332WY0bNy6y1wfgnqwVdn19vUpKSiRJ\nBQUFSiaTSqVSkvp2Nlse7927V4sWLdawYS9Kku7e/R8NHvxfyssbJUm6f///JGnAjzs7WzVr1njt\n37/beN6HN9C4/P6Tz818fR48ToV4/PBfm+75ivT+bP39ampqkqTevswEk8jfuHHjhp566mndu3fj\nz/d46vuHIxu26qWXWvSf/2zN4msOjOd5vf9guYh88TGwScRT+P/t2ddDfFtfJFKmD4iMLf9jHyjy\n2S5l+oBYCVXYL730kp577jmdOXNGY8eO1Ycffhj1XQCAvwhV2Dt27FBHR4fu3r2r9vZ2rVq1Kuq7\nYsYzfUBkgvuiW8hnO8/0AbHCJAIAlqCwQ0mZPiAyrm+g5LNdyvQBsUJhA4AlKOxQPNMHRMb1DZR8\ntvNMHxArFDYAWILCDiVl+oDIuL6Bks92KdMHxAqFDQCWoLBD8UwfEBnXN1Dy2c4zfUCsUNgAYAkK\nO5SU6QMi4/oGSj7bpUwfECsUNgBYgsIOxTN9QGRc30DJZzvP9AGxQmEDgCUo7FBSpg+IjOsbKPls\nlzJ9QKxQ2ABgCQo7FM/0AZFxfQMln+080wfECoUNAJagsENJmT4gMq5voOSzXcr0AbFCYQOAJSjs\nUDzTB0TG9Q2UfLbzTB8QKxQ2AFiCwg4lZfqAyLi+gZLPdinTB8QKhQ0AlqCwQ/FMHxAZ1zdQ8tnO\nM31ArFDYAGAJCjuUlOkDIuP6Bko+26VMHxArFDYAWILCDsUzfUBkXN9AyWc7z/QBsUJhA4AlKOxQ\nUqYPiIzrGyj5bJcyfUCsUNgAYAkKOxTP9AGRcX0DJZ/tPNMHxAqFDQCWoLBDSZk+IDKub6Dks13K\n9AGxQmEDgCVCFXZzc7MmTpyo8ePH65133on6phjyTB8QGdc3UPLZzjN9QKykLezu7m69/vrram5u\n1qlTp7Rjxw6dPn06F7fFSKvpAyLT2upuNol89nM9X2bSFnZLS4vGjRunkpIS5efna/ny5frss89y\ncVuM/G76gMj8/ru72STy2c/1fJlJW9gXLlzQ2LFjex8XFxfrwoULkR4FAAjKS/eERCKRiztiJZFI\nqLv7jp58cpEk6fbt4xo69FjWXr+r63+Vl/ffWXu9x9HW1mb6hEiRz3Ztpg+IFz+NI0eO+LW1tb2P\n169f7zc2NvZ7TmlpqS+JN9544423DN5KS0vTVXA/Cd/3fT3C/fv3VVZWpq+++kpPP/20pk2bph07\ndmjSpEmP+ssAAFmWdhLJy8vT+++/r9raWnV3d2vNmjWUNQAYkPYzbABAPAz4Jx0//vhjTZ48WYMH\nD9Z3333X79caGho0fvx4TZw4Ufv27XvsI01x7QeGVq9ercLCQlVUVPS+79dff1VNTY0mTJig+fPn\nW/1tYu3t7Zo9e7YmT56sKVOmaPPmzZLcyNjZ2anp06crmUyqvLxcb775piQ3sj2su7tbVVVVWrTo\njy/4u5SvpKREU6dOVVVVlaZNmyYp83wDLuyKigp9+umnmjlzZr/3nzp1Srt27dKpU6fU3Nys1157\nTT09PQP9MMa4+ANDq1atUnNzc7/3NTY2qqamRmfOnNHcuXPV2Nho6LrHl5+fr40bN+rkyZM6evSo\ntmzZotOnTzuRcciQITpw4IBaW1t14sQJHThwQAcPHnQi28M2bdqk8vLy3u9OcylfIpGQ53k6fvy4\nWlpaJA0gX0ZfovwbqVTKP3bs2D9+F0ltba1/5MiRx/0wOXf48OF+3x3T0NDgNzQ0GLwoO86dO+dP\nmTKl93FZWZl/6dIl3/d9/+LFi35ZWZmp07JuyZIl/v79+53LeOvWLb+6utr/4YcfnMrW3t7uz507\n1//666/9F154wfd9t/75LCkp8a9evdrvfZnmy/of/tTR0aHi4uLex7b+oM2/5QeGLl++rMLCQklS\nYWGhLl++bPii7Ghra9Px48c1ffp0ZzL29PQomUyqsLCwd/pxJZskrV27Vhs2bNCgQX215FK+RCKh\nefPmqbq6Wlu3bpWUeb5HfpdITU2NLl26FHj/+vXrezemsIfaxsabH1cikXAi982bN1VXV6dNmzbp\niSee6PdrNmccNGiQWltbde3aNdXW1urAgQP9ft3mbLt379aoUaNUVVX1j3+glc35JOnQoUMqKirS\nlStXVFNTo4kTJ/b79TD5HlnY+/fvz/ioMWPGqL29vffx+fPnNWbMmIxfx7S/5mhvb+/3Xw6uKCws\n1KVLlzR69GhdvHhRo0aNMn3SY7l3757q6uq0YsUKLV26VJJ7GYcPH66FCxfq2LFjzmQ7fPiwPv/8\nc+3Zs0ednZ26fv26VqxY4Uw+SSoqKpIkjRw5UsuWLVNLS0vG+bIyifgPfWfg4sWLtXPnTnV1denc\nuXM6e/Zs71dEbVJdXa2zZ8+qra1NXV1d2rVrlxYvXmz6rKxbvHixtm3bJknatm1bb8nZyPd9rVmz\nRuXl5XrjjTd63+9CxqtXr/Z+B8GdO3e0f/9+VVVVOZFN+uO/2tvb23Xu3Dnt3LlTc+bM0fbt253J\nd/v2bd24cUOSdOvWLe3bt08VFRWZ5xvogP7JJ5/4xcXF/pAhQ/zCwkJ/wYIFvb/29ttv+6WlpX5Z\nWZnf3Nw80A9h3J49e/wJEyb4paWl/vr1602f89iWL1/uFxUV+fn5+X5xcbH/wQcf+L/88os/d+5c\nf/z48X5NTY3/22+/mT5zwL755hs/kUj4lZWVfjKZ9JPJpL93714nMp44ccKvqqryKysr/YqKCv/d\nd9/1fd93IttfeZ7nL1q0yPd9d/L9/PPPfmVlpV9ZWelPnjy5t08yzccPzgCAJfi/CAMAS1DYAGAJ\nChsALEFhA4AlKGwAsASFDQCWoLABwBIUNgBY4v8BKJfJfpH1rU8AAAAASUVORK5CYII=\n",
"text": [
"<matplotlib.figure.Figure at 0x109599e50>"
]
}
],
"prompt_number": 177
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reading and writing\n",
"\n",
"`csv`, `excel`, `hdf`, `sql`, `json`, `html`, `stata`, `clipboard`, `pickle`,\n",
"and experimental (as of Pandas 0.13.1): `msgpack`, `gbq`.\n",
"\n",
"And of course typical Python objects, like `list`, `dict` or `numpy.array`.\n",
"\n",
"http://pandas.pydata.org/pandas-docs/stable/io.html"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df = pd.read_csv(\"file.csv\",\n",
" encoding='utf8')"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Comments\n",
"\n",
"When importing data, be careful to remember to distinguish `str`/`uniode` from `int`.\n",
"\n",
"E.g. `JSON` format can have only strings as keys, and it is easy to get nasty bug.\n",
"\n",
"Big data warning: if it fits your memory, you are better off doing it locally that doing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## by Piotr Migda\u0142\n",
"\n",
"pmigdal@gmail.com, http://migdal.wikidot.com\n",
"\n",
"I freelance in data analysis and interactive visualization."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment