Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save stared/6dc90471f0154c697c8b to your computer and use it in GitHub Desktop.
Save stared/6dc90471f0154c697c8b to your computer and use it in GitHub Desktop.
Introduction to Pandas (from: The Barcelona Python Meetup Group: Python & Sciences, 24 April 2014)
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "",
"signature": "sha256:59f5a4861925d4bd4b833a2681e4dbf2f891f12ff0631cea44a53f90e6b12999"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction to Pandas \n",
"\n",
"* Piotr Migda\u0142, http://migdal.wikidot.com\n",
"* The Barcelona Python Meetup Group: [Python & Sciences](http://www.meetup.com/python-185/events/169870182/) (24 April 2014)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# [Pandas](http://pandas.pydata.org/)\n",
"\n",
"* 0.13.1 released (February 3, 2014)\n",
"\n",
" pandas is an open source, BSD-licensed library\n",
" providing high-performance, easy-to-use data structures\n",
" and data analysis tools for the Python programming language\n",
" \n",
"In practice, Pandas brings **R**-like data structures, great for working with **tabular data**. Tabular?\n",
"\n",
"* Everything that goes into **SQL**, **Excel** or **CSV**.\n",
"* Not all data, but data you typically work with.\n",
"\n",
"Both for data exploration and production.\n",
"\n",
"If you can do it with SQL or Excel, you can do it with Pandas!\n",
"\n",
"(Counterexamples?)\n",
"\n",
"So:\n",
"\n",
" $ pip install pandas"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas as pd"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Series\n",
"\n",
"* Creating\n",
"* Adding\n",
"* Concatenating\n",
"* Filtering\n",
"* Applying map"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser = pd.Series(['a','b','c', 'd', 'e'])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 3,
"text": [
"0 a\n",
"1 b\n",
"2 c\n",
"3 d\n",
"4 e\n",
"dtype: object"
]
}
],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser[1] = 12"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser[9] = 'qqq'"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 6,
"text": [
"0 a\n",
"1 12\n",
"2 c\n",
"3 d\n",
"4 e\n",
"9 qqq\n",
"dtype: object"
]
}
],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser['ten'] = 'abc'"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 8,
"text": [
"0 a\n",
"1 12\n",
"2 c\n",
"3 d\n",
"4 e\n",
"9 qqq\n",
"ten abc\n",
"dtype: object"
]
}
],
"prompt_number": 8
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# ser.loc[k] accesses object with index k (can be integer, string or timestamp)\n",
"ser.loc['ten']"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 9,
"text": [
"'abc'"
]
}
],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# ser.iloc[i] - accessed i-th object; i in range(0, len(ser))\n",
"ser.iloc[4]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 10,
"text": [
"'e'"
]
}
],
"prompt_number": 10
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser != 'c'"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 11,
"text": [
"0 True\n",
"1 True\n",
"2 False\n",
"3 True\n",
"4 True\n",
"9 True\n",
"ten True\n",
"dtype: bool"
]
}
],
"prompt_number": 11
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser[ser != 'c']"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 12,
"text": [
"0 a\n",
"1 12\n",
"3 d\n",
"4 e\n",
"9 qqq\n",
"ten abc\n",
"dtype: object"
]
}
],
"prompt_number": 12
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser = ser.append(pd.Series([8,2,1],\n",
" index=[10,100,1000]))"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 13
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 14,
"text": [
"0 a\n",
"1 12\n",
"2 c\n",
"3 d\n",
"4 e\n",
"9 qqq\n",
"ten abc\n",
"10 8\n",
"100 2\n",
"1000 1\n",
"dtype: object"
]
}
],
"prompt_number": 14
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ser.describe()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 15,
"text": [
"count 10\n",
"unique 10\n",
"top a\n",
"freq 1\n",
"dtype: object"
]
}
],
"prompt_number": 15
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data frame\n",
"\n",
"* Creating\n",
"* Relations to Series; columns and index\n",
"* columns, loc, iloc, ix\n",
"* Iterating\n",
"* Filtering\n",
"* Joins\n",
"* Groupby\n",
"* Apply, axes\n",
"* Renaming things\n",
"* Dealings with NaNs\n",
"* Data types: (numpy) float, int, obj"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 16
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df = pd.DataFrame(np.random.randn(6,4),\n",
" columns=[\"random\", \"guess\", \"chance\", \"luck\"])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 17
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>random</th>\n",
" <th>guess</th>\n",
" <th>chance</th>\n",
" <th>luck</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>-0.820053</td>\n",
" <td> 0.952948</td>\n",
" <td> 2.062030</td>\n",
" <td> 1.399049</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 0.784917</td>\n",
" <td> 1.015215</td>\n",
" <td>-0.285176</td>\n",
" <td>-0.511790</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 0.079101</td>\n",
" <td>-1.468544</td>\n",
" <td>-0.047072</td>\n",
" <td>-0.450447</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 1.032954</td>\n",
" <td> 0.461783</td>\n",
" <td> 1.509796</td>\n",
" <td>-0.323996</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>-0.590753</td>\n",
" <td>-0.287149</td>\n",
" <td> 0.596965</td>\n",
" <td> 1.252726</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> 0.884594</td>\n",
" <td>-0.125060</td>\n",
" <td>-1.458435</td>\n",
" <td> 1.159053</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>6 rows \u00d7 4 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 18,
"text": [
" random guess chance luck\n",
"0 -0.820053 0.952948 2.062030 1.399049\n",
"1 0.784917 1.015215 -0.285176 -0.511790\n",
"2 0.079101 -1.468544 -0.047072 -0.450447\n",
"3 1.032954 0.461783 1.509796 -0.323996\n",
"4 -0.590753 -0.287149 0.596965 1.252726\n",
"5 0.884594 -0.125060 -1.458435 1.159053\n",
"\n",
"[6 rows x 4 columns]"
]
}
],
"prompt_number": 18
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[\"luck\"]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 19,
"text": [
"0 1.399049\n",
"1 -0.511790\n",
"2 -0.450447\n",
"3 -0.323996\n",
"4 1.252726\n",
"5 1.159053\n",
"Name: luck, dtype: float64"
]
}
],
"prompt_number": 19
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[\"luck\"] + 1"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 20,
"text": [
"0 2.399049\n",
"1 0.488210\n",
"2 0.549553\n",
"3 0.676004\n",
"4 2.252726\n",
"5 2.159053\n",
"Name: luck, dtype: float64"
]
}
],
"prompt_number": 20
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[\"fate\"] = 2 * df[\"luck\"] - df[\"random\"]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 21
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>random</th>\n",
" <th>guess</th>\n",
" <th>chance</th>\n",
" <th>luck</th>\n",
" <th>fate</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>-0.820053</td>\n",
" <td> 0.952948</td>\n",
" <td> 2.062030</td>\n",
" <td> 1.399049</td>\n",
" <td> 3.618150</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 0.784917</td>\n",
" <td> 1.015215</td>\n",
" <td>-0.285176</td>\n",
" <td>-0.511790</td>\n",
" <td>-1.808498</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 0.079101</td>\n",
" <td>-1.468544</td>\n",
" <td>-0.047072</td>\n",
" <td>-0.450447</td>\n",
" <td>-0.979995</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 1.032954</td>\n",
" <td> 0.461783</td>\n",
" <td> 1.509796</td>\n",
" <td>-0.323996</td>\n",
" <td>-1.680946</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>-0.590753</td>\n",
" <td>-0.287149</td>\n",
" <td> 0.596965</td>\n",
" <td> 1.252726</td>\n",
" <td> 3.096204</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> 0.884594</td>\n",
" <td>-0.125060</td>\n",
" <td>-1.458435</td>\n",
" <td> 1.159053</td>\n",
" <td> 1.433513</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>6 rows \u00d7 5 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 22,
"text": [
" random guess chance luck fate\n",
"0 -0.820053 0.952948 2.062030 1.399049 3.618150\n",
"1 0.784917 1.015215 -0.285176 -0.511790 -1.808498\n",
"2 0.079101 -1.468544 -0.047072 -0.450447 -0.979995\n",
"3 1.032954 0.461783 1.509796 -0.323996 -1.680946\n",
"4 -0.590753 -0.287149 0.596965 1.252726 3.096204\n",
"5 0.884594 -0.125060 -1.458435 1.159053 1.433513\n",
"\n",
"[6 rows x 5 columns]"
]
}
],
"prompt_number": 22
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[\"random\"] = df[\"random\"].apply(lambda x: 'a' if x > 0 else 'b')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 23
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>random</th>\n",
" <th>guess</th>\n",
" <th>chance</th>\n",
" <th>luck</th>\n",
" <th>fate</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> b</td>\n",
" <td> 0.952948</td>\n",
" <td> 2.062030</td>\n",
" <td> 1.399049</td>\n",
" <td> 3.618150</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> a</td>\n",
" <td> 1.015215</td>\n",
" <td>-0.285176</td>\n",
" <td>-0.511790</td>\n",
" <td>-1.808498</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> a</td>\n",
" <td>-1.468544</td>\n",
" <td>-0.047072</td>\n",
" <td>-0.450447</td>\n",
" <td>-0.979995</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> a</td>\n",
" <td> 0.461783</td>\n",
" <td> 1.509796</td>\n",
" <td>-0.323996</td>\n",
" <td>-1.680946</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> b</td>\n",
" <td>-0.287149</td>\n",
" <td> 0.596965</td>\n",
" <td> 1.252726</td>\n",
" <td> 3.096204</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> a</td>\n",
" <td>-0.125060</td>\n",
" <td>-1.458435</td>\n",
" <td> 1.159053</td>\n",
" <td> 1.433513</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>6 rows \u00d7 5 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 24,
"text": [
" random guess chance luck fate\n",
"0 b 0.952948 2.062030 1.399049 3.618150\n",
"1 a 1.015215 -0.285176 -0.511790 -1.808498\n",
"2 a -1.468544 -0.047072 -0.450447 -0.979995\n",
"3 a 0.461783 1.509796 -0.323996 -1.680946\n",
"4 b -0.287149 0.596965 1.252726 3.096204\n",
"5 a -0.125060 -1.458435 1.159053 1.433513\n",
"\n",
"[6 rows x 5 columns]"
]
}
],
"prompt_number": 24
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df.ix[1:3, \"guess\":\"luck\"]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>guess</th>\n",
" <th>chance</th>\n",
" <th>luck</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 1.015215</td>\n",
" <td>-0.285176</td>\n",
" <td>-0.511790</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>-1.468544</td>\n",
" <td>-0.047072</td>\n",
" <td>-0.450447</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 0.461783</td>\n",
" <td> 1.509796</td>\n",
" <td>-0.323996</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>3 rows \u00d7 3 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 25,
"text": [
" guess chance luck\n",
"1 1.015215 -0.285176 -0.511790\n",
"2 -1.468544 -0.047072 -0.450447\n",
"3 0.461783 1.509796 -0.323996\n",
"\n",
"[3 rows x 3 columns]"
]
}
],
"prompt_number": 25
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a_column = pd.DataFrame(np.random.randn(3,1),\n",
" index=[1,4,9],\n",
" columns=[\"new\"])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 26
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a_column"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>new</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>-0.438710</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>-0.525838</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>-1.167854</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>3 rows \u00d7 1 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 27,
"text": [
" new\n",
"1 -0.438710\n",
"4 -0.525838\n",
"9 -1.167854\n",
"\n",
"[3 rows x 1 columns]"
]
}
],
"prompt_number": 27
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df = pd.concat([df, a_column], axis=1)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 28
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>random</th>\n",
" <th>guess</th>\n",
" <th>chance</th>\n",
" <th>luck</th>\n",
" <th>fate</th>\n",
" <th>new</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> b</td>\n",
" <td> 0.952948</td>\n",
" <td> 2.062030</td>\n",
" <td> 1.399049</td>\n",
" <td> 3.618150</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> a</td>\n",
" <td> 1.015215</td>\n",
" <td>-0.285176</td>\n",
" <td>-0.511790</td>\n",
" <td>-1.808498</td>\n",
" <td>-0.438710</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> a</td>\n",
" <td>-1.468544</td>\n",
" <td>-0.047072</td>\n",
" <td>-0.450447</td>\n",
" <td>-0.979995</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> a</td>\n",
" <td> 0.461783</td>\n",
" <td> 1.509796</td>\n",
" <td>-0.323996</td>\n",
" <td>-1.680946</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> b</td>\n",
" <td>-0.287149</td>\n",
" <td> 0.596965</td>\n",
" <td> 1.252726</td>\n",
" <td> 3.096204</td>\n",
" <td>-0.525838</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> a</td>\n",
" <td>-0.125060</td>\n",
" <td>-1.458435</td>\n",
" <td> 1.159053</td>\n",
" <td> 1.433513</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td>-1.167854</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>7 rows \u00d7 6 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 29,
"text": [
" random guess chance luck fate new\n",
"0 b 0.952948 2.062030 1.399049 3.618150 NaN\n",
"1 a 1.015215 -0.285176 -0.511790 -1.808498 -0.438710\n",
"2 a -1.468544 -0.047072 -0.450447 -0.979995 NaN\n",
"3 a 0.461783 1.509796 -0.323996 -1.680946 NaN\n",
"4 b -0.287149 0.596965 1.252726 3.096204 -0.525838\n",
"5 a -0.125060 -1.458435 1.159053 1.433513 NaN\n",
"9 NaN NaN NaN NaN NaN -1.167854\n",
"\n",
"[7 rows x 6 columns]"
]
}
],
"prompt_number": 29
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[df[\"chance\"] < 0]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>random</th>\n",
" <th>guess</th>\n",
" <th>chance</th>\n",
" <th>luck</th>\n",
" <th>fate</th>\n",
" <th>new</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> a</td>\n",
" <td> 1.015215</td>\n",
" <td>-0.285176</td>\n",
" <td>-0.511790</td>\n",
" <td>-1.808498</td>\n",
" <td>-0.43871</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> a</td>\n",
" <td>-1.468544</td>\n",
" <td>-0.047072</td>\n",
" <td>-0.450447</td>\n",
" <td>-0.979995</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> a</td>\n",
" <td>-0.125060</td>\n",
" <td>-1.458435</td>\n",
" <td> 1.159053</td>\n",
" <td> 1.433513</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>3 rows \u00d7 6 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 30,
"text": [
" random guess chance luck fate new\n",
"1 a 1.015215 -0.285176 -0.511790 -1.808498 -0.43871\n",
"2 a -1.468544 -0.047072 -0.450447 -0.979995 NaN\n",
"5 a -0.125060 -1.458435 1.159053 1.433513 NaN\n",
"\n",
"[3 rows x 6 columns]"
]
}
],
"prompt_number": 30
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[\"guess\"][df[\"chance\"] < 0] = 42"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 31
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>random</th>\n",
" <th>guess</th>\n",
" <th>chance</th>\n",
" <th>luck</th>\n",
" <th>fate</th>\n",
" <th>new</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> b</td>\n",
" <td> 0.952948</td>\n",
" <td> 2.062030</td>\n",
" <td> 1.399049</td>\n",
" <td> 3.618150</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> a</td>\n",
" <td> 42.000000</td>\n",
" <td>-0.285176</td>\n",
" <td>-0.511790</td>\n",
" <td>-1.808498</td>\n",
" <td>-0.438710</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> a</td>\n",
" <td> 42.000000</td>\n",
" <td>-0.047072</td>\n",
" <td>-0.450447</td>\n",
" <td>-0.979995</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> a</td>\n",
" <td> 0.461783</td>\n",
" <td> 1.509796</td>\n",
" <td>-0.323996</td>\n",
" <td>-1.680946</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> b</td>\n",
" <td> -0.287149</td>\n",
" <td> 0.596965</td>\n",
" <td> 1.252726</td>\n",
" <td> 3.096204</td>\n",
" <td>-0.525838</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> a</td>\n",
" <td> 42.000000</td>\n",
" <td>-1.458435</td>\n",
" <td> 1.159053</td>\n",
" <td> 1.433513</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td>-1.167854</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>7 rows \u00d7 6 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 32,
"text": [
" random guess chance luck fate new\n",
"0 b 0.952948 2.062030 1.399049 3.618150 NaN\n",
"1 a 42.000000 -0.285176 -0.511790 -1.808498 -0.438710\n",
"2 a 42.000000 -0.047072 -0.450447 -0.979995 NaN\n",
"3 a 0.461783 1.509796 -0.323996 -1.680946 NaN\n",
"4 b -0.287149 0.596965 1.252726 3.096204 -0.525838\n",
"5 a 42.000000 -1.458435 1.159053 1.433513 NaN\n",
"9 NaN NaN NaN NaN NaN -1.167854\n",
"\n",
"[7 rows x 6 columns]"
]
}
],
"prompt_number": 32
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df.fillna(df.mean())"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>random</th>\n",
" <th>guess</th>\n",
" <th>chance</th>\n",
" <th>luck</th>\n",
" <th>fate</th>\n",
" <th>new</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> b</td>\n",
" <td> 0.952948</td>\n",
" <td> 2.062030</td>\n",
" <td> 1.399049</td>\n",
" <td> 3.618150</td>\n",
" <td>-0.710800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> a</td>\n",
" <td> 42.000000</td>\n",
" <td>-0.285176</td>\n",
" <td>-0.511790</td>\n",
" <td>-1.808498</td>\n",
" <td>-0.438710</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> a</td>\n",
" <td> 42.000000</td>\n",
" <td>-0.047072</td>\n",
" <td>-0.450447</td>\n",
" <td>-0.979995</td>\n",
" <td>-0.710800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> a</td>\n",
" <td> 0.461783</td>\n",
" <td> 1.509796</td>\n",
" <td>-0.323996</td>\n",
" <td>-1.680946</td>\n",
" <td>-0.710800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> b</td>\n",
" <td> -0.287149</td>\n",
" <td> 0.596965</td>\n",
" <td> 1.252726</td>\n",
" <td> 3.096204</td>\n",
" <td>-0.525838</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> a</td>\n",
" <td> 42.000000</td>\n",
" <td>-1.458435</td>\n",
" <td> 1.159053</td>\n",
" <td> 1.433513</td>\n",
" <td>-0.710800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td> NaN</td>\n",
" <td> 21.187930</td>\n",
" <td> 0.396351</td>\n",
" <td> 0.420766</td>\n",
" <td> 0.613072</td>\n",
" <td>-1.167854</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>7 rows \u00d7 6 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 33,
"text": [
" random guess chance luck fate new\n",
"0 b 0.952948 2.062030 1.399049 3.618150 -0.710800\n",
"1 a 42.000000 -0.285176 -0.511790 -1.808498 -0.438710\n",
"2 a 42.000000 -0.047072 -0.450447 -0.979995 -0.710800\n",
"3 a 0.461783 1.509796 -0.323996 -1.680946 -0.710800\n",
"4 b -0.287149 0.596965 1.252726 3.096204 -0.525838\n",
"5 a 42.000000 -1.458435 1.159053 1.433513 -0.710800\n",
"9 NaN 21.187930 0.396351 0.420766 0.613072 -1.167854\n",
"\n",
"[7 rows x 6 columns]"
]
}
],
"prompt_number": 33
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for k, v in df.iterrows():\n",
" print v['chance']"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2.06203037589\n",
"-0.28517591262\n",
"-0.0470721934499\n",
"1.50979564075\n",
"0.596964926409\n",
"-1.45843483675\n",
"nan\n"
]
}
],
"prompt_number": 34
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df.values"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 35,
"text": [
"array([['b', 0.9529479586539865, 2.0620303758857146, 1.3990485405047015,\n",
" 3.6181499063932105, nan],\n",
" ['a', 42.0, -0.2851759126198742, -0.5117904093719614,\n",
" -1.8084975116604265, -0.43870955141495604],\n",
" ['a', 42.0, -0.04707219344987416, -0.4504467615033189,\n",
" -0.9799947119018371, nan],\n",
" ['a', 0.4617828830291011, 1.509795640747286, -0.3239958303468689,\n",
" -1.6809457423888072, nan],\n",
" ['b', -0.2871493218525701, 0.5969649264089326, 1.2527257785800212,\n",
" 3.096204399794183, -0.5258376995671956],\n",
" ['a', 42.0, -1.4584348367523565, 1.1590534187255992,\n",
" 1.4335128551444463, nan],\n",
" [nan, nan, nan, nan, nan, -1.1678536659869951]], dtype=object)"
]
}
],
"prompt_number": 35
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data summary\n",
"\n",
"* head, describe, hist, value_counts"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[\"random\"].value_counts()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 36,
"text": [
"a 4\n",
"b 2\n",
"dtype: int64"
]
}
],
"prompt_number": 36
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[\"guess\"].describe()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 37,
"text": [
"count 6.000000\n",
"mean 21.187930\n",
"std 22.801901\n",
"min -0.287149\n",
"25% 0.584574\n",
"50% 21.476474\n",
"75% 42.000000\n",
"max 42.000000\n",
"Name: guess, dtype: float64"
]
}
],
"prompt_number": 37
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[\"guess\"].hist()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 38,
"text": [
"<matplotlib.axes.AxesSubplot at 0x10c70ca10>"
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAXUAAAEACAYAAABMEua6AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGqpJREFUeJzt3H9M1Pf9B/DnMcgMaKQm83T8KJtQBcW7s2Yki50f56gr\nnczF/mGTOVH/YNrOuD9N1jRLFO32h9O4NHZJta6J+p3ZUrIgqcl4d1OHZvZMu52ZUMQeKKTOsY62\nVse9v38UTin0jtP3u+8ffT4SQt/cx+P5svL0eHFnSEopQUREXsgzHYCIiNRhqRMReYSlTkTkEZY6\nEZFHWOpERB5hqRMReSRjqd+6dQt1dXWIRqOoqanBjh07Jr1u27ZtqKqqQiQSQTwe1xKUiIiyy890\n47Rp09DR0YHCwkL873//w7Jly3D69GksW7YsfU1bWxu6u7vR1dWFc+fOYcuWLejs7NQenIiIJsq6\nfiksLAQA3L59GyMjI5g1a9a421tbW7FhwwYAQF1dHYaGhjA4OKghKhERZZO11FOpFKLRKMLhMFas\nWIGamppxt/f396OsrCx9Li0tRV9fn/qkRESUVdZSz8vLw8WLF9HX14c///nPEEJMuObT/9JAKBRS\nFpCIiKYu4079XjNnzsSTTz6Jv/3tbwiCIP3xkpISJJPJ9Lmvrw8lJSUTfn1JSQmuXbv2YGmJiL5g\n5s2bh+7u7ilfn/GR+o0bNzA0NAQA+Oijj3Dq1CnEYrFx1zQ2NuLIkSMAgM7OThQXFyMcDk+4r2vX\nrkFK6dXbrl27kJe3A4AE8Pzoe3VvRUVluHr1qvE5n3/+eeMZON8XY75P5Pq1ksvXHozPmOvbO++8\nk6mmJ8j4SP369evYsGEDUqkUUqkU1q9fj5UrV+LgwYMAgObmZjQ0NKCtrQ2VlZUoKirCoUOHcgrg\nj17TAbTp7e01HUErzue6XtMBrJKx1Gtra/Hmm29O+Hhzc/O484EDB9SmIiKi+8JXlCrTZDqANk1N\nTaYjaMX5XNdkOoBVWOrKBKYDaHPvD8Z9xPlcF5gOYBWWujLCdABtJnsaq084n+uE6QBWYakTEXmE\npa5MYDqANr5/+875XBeYDmAVljoRkUdY6soI0wG08X0ny/lcJ0wHsApLnYjIIyx1ZQLTAbTxfSfL\n+VwXmA5gFZY6EZFHWOrKCNMBtPF9J8v5XCdMB7AKS52IyCMsdWUC0wG08X0ny/lcF5gOYBWWOhGR\nR1jqygjTAbTxfSfL+VwnTAewCkudiMgjLHVlAtMBtPF9J8v5XBeYDmAVljoRkUdY6soI0wG08X0n\ny/lcJ0wHsApLnYjIIyx1ZQLTAbTxfSfL+VwXmA5gFZY6EZFHWOrKCNMBtPF9J8v5XCdMB7AKS52I\nyCMsdWUC0wG08X0ny/lcF5gOYBWWOhGRR1jqygjTAbTxfSfL+VwnTAewCkudiMgjGUs9mUxixYoV\nWLhwIRYtWoT9+/dPuEYIgZkzZyIWiyEWi2Hnzp3awtotMB1AG993spzPdYHpAFbJz3RjQUEB9u7d\ni2g0iuHhYTz66KOor69HdXX1uOuWL1+O1tZWrUGJiCi7jI/U58yZg2g0CgCYPn06qqurce3atQnX\nSSn1pHOKMB1AG993spzPdcJ0AKtMeafe29uLeDyOurq6cR8PhUI4e/YsIpEIGhoakEgklIckIqKp\nybh+GTM8PIynnnoK+/btw/Tp08fdtmTJEiSTSRQWFuLkyZNYs2YNLl++rCWs3QLTAbTxfSfL+VwX\nmA5glaylfufOHaxduxY//OEPsWbNmgm3z5gxI/3fTzzxBLZu3YqbN29i1qxZE65tampCRUUFAKC4\nuBjRaDT9B27sW0SXzj09PQBmj04nRt8Hys4jI7cwxoZ5eeZZ9/musXOg+Ayt+VWchRA4fPgwAKT7\nMicyg1QqJdevXy+3b9/+mdcMDAzIVColpZTy3Llz8uGHH570uiyfykm7du2SeXk7JCAl0DH6Xt1b\nUVGZvHr1qukxZUdHh+kIWnE+ewC4j6+VXL723OuhXDNnfKR+5swZvPrqq1i8eDFisRgAoKWlBe++\n+y4AoLm5GSdOnMCLL76I/Px8FBYW4tixY7n/zUJEREqERv8m0P+JQiHvniXT0tKC554bRirVouX+\ni4rKkUicRnl5uZb7J7JNKBQCoLMn3OuhXLuTryglIvIIS10ZYTqANhN/iOUXzuc6YTqAVVjqREQe\nYakrE5gOoM3Y0658xflcF5gOYBWWOhGRR1jqygjTAbTxfSfL+VwnTAewCkudiMgjLHVlAtMBtPF9\nJ8v5XBeYDmAVljoRkUdY6soI0wG08X0ny/lcJ0wHsApLnYjIIyx1ZQLTAbTxfSfL+VwXmA5gFZY6\nEZFHWOrKCNMBtPF9J8v5XCdMB7AKS52IyCMsdWUC0wG08X0ny/lcF5gOYBWWOhGRR1jqygjTAbTx\nfSfL+VwnTAewCkudiMgjLHVlAtMBtPF9J8v5XBeYDmAVljoRkUdY6soI0wG08X0ny/lcJ0wHsApL\nnYjIIyx1ZQLTAbTxfSfL+VwXmA5gFZY6EZFHWOrKCNMBtPF9J8v5XCdMB7AKS52IyCMsdWUC0wG0\n8X0ny/lcF5gOYBWWOhGRRzKWejKZxIoVK7Bw4UIsWrQI+/fvn/S6bdu2oaqqCpFIBPF4XEtQ+wnT\nAbTxfSfL+VwnTAewSn6mGwsKCrB3715Eo1EMDw/j0UcfRX19Paqrq9PXtLW1obu7G11dXTh37hy2\nbNmCzs5O7cGJiGiijI/U58yZg2g0CgCYPn06qqurce3atXHXtLa2YsOGDQCAuro6DA0NYXBwUFNc\nmwWmA2jj+06W87kuMB3AKlPeqff29iIej6Ourm7cx/v7+1FWVpY+l5aWoq+vT11CIiKasozrlzHD\nw8N46qmnsG/fPkyfPn3C7VLKcedQKDTp/TQ1NaGiogIAUFxcjGg0mn4UMbb3c+nc09MDYPbodL8C\nEMXdRw1i9P39n0dGbmGMyXnv3cna9PvP+fyb766xczCF872/Ntv10Jpf1f+vw4cPA0C6L3Mis7h9\n+7Z8/PHH5d69eye9vbm5WR49ejR9nj9/vhwYGJhw3RQ+lXN27dol8/J2SEBKoGP0vbq3oqIyefXq\nVdNjyo6ODtMRtOJ89gBwH18ruXztuddDuWbOuH6RUmLz5s2oqanB9u3bJ72msbERR44cAQB0dnai\nuLgY4XA4979dnBeYDqDN2KMJX3E+1wWmA1gl4/rlzJkzePXVV7F48WLEYjEAQEtLC959910AQHNz\nMxoaGtDW1obKykoUFRXh0KFD+lMTEdGkMpb6smXLkEqlst7JgQMHlAVyl4CvjxiEEF4/2uN8rhPw\n9WvvfvAVpUREHmGpKxOYDqCN34/yOJ/7AtMBrMJSJyLyCEtdGWE6gDYTn0PsF87nOmE6gFVY6kRE\nHmGpKxOYDqCN7ztZzue6wHQAq7DUiYg8wlJXRpgOoI3vO1nO5zphOoBVWOpERB5hqSsTmA6gje87\nWc7nusB0AKuw1ImIPMJSV0aYDqCN7ztZzuc6YTqAVVjqREQeYakrE5gOoI3vO1nO57rAdACrsNSJ\niDzCUldGmA6gje87Wc7nOmE6gFVY6kREHmGpKxOYDqCN7ztZzue6wHQAq7DUiYg8wlJXRpgOoI3v\nO1nO5zphOoBVWOpERB5hqSsTmA6gje87Wc7nusB0AKuw1ImIPMJSV0aYDqCN7ztZzuc6YTqAVVjq\nREQeYakrE5gOoI3vO1nO57rAdACrsNSJiDzCUldGmA6gje87Wc7nOmE6gFWylvqmTZsQDodRW1s7\n6e1CCMycOROxWAyxWAw7d+5UHpKIiKYmP9sFGzduxE9+8hP86Ec/+sxrli9fjtbWVqXB3BOYDqCN\n7ztZzue6wHQAq2R9pP7YY4/hoYceyniNlFJZICIiun8PvFMPhUI4e/YsIpEIGhoakEgkVORykDAd\nQBvfd7Kcz3XCdACrZF2/ZLNkyRIkk0kUFhbi5MmTWLNmDS5fvjzptU1NTaioqAAAFBcXIxqNpr81\nHPuD59K5p6cHwOzR6S6Ovg9G34sHPo+M3MIYG+blmWfd57vGzoHiM7TmV3EWQuDw4cMAkO7LXITk\nFHYnvb29WL16Nd5+++2sd/i1r30NFy5cwKxZs8Z/olDIuzVNS0sLnntuGKlUi5b7LyoqRyJxGuXl\n5Vrun8g2oVAIgM6ecK+Hcu3OB16/DA4Opj/h+fPnIaWcUOhERPT5yFrqTz/9NL75zW/in//8J8rK\nyvDyyy/j4MGDOHjwIADgxIkTqK2tRTQaxfbt23Hs2DHtoe0kTAfQZuK3xn7hfK4TpgNYJetO/ejR\noxlvf+aZZ/DMM88oC0RERPePryhVJjAdQJuxH+b4ivO5LjAdwCosdSIij7DUlRGmA2jj+06W87lO\nmA5gFZY6EZFHWOrKBKYDaOP7TpbzuS4wHcAqLHUiIo+w1JURpgNo4/tOlvO5TpgOYBWWOhGRR1jq\nygSmA2jj+06W87kuMB3AKix1IiKPsNSVEaYDaOP7TpbzuU6YDmAVljoRkUdY6soEpgNo4/tOlvO5\nLjAdwCosdSIij7DUlRGmA2jj+06W87lOmA5gFZY6EZFHWOrKBKYDaOP7TpbzuS4wHcAqLHUiIo+w\n1JURpgNo4/tOlvO5TpgOYBWWOhGRR1jqygSmA2jj+06W87kuMB3AKix1IiKPsNSVEaYDaOP7Tpbz\nuU6YDmAVljoRkUdY6soEpgNo4/tOlvO5LjAdwCosdSIij7DUlRGmA2jj+06W87lOmA5gFZY6EZFH\nspb6pk2bEA6HUVtb+5nXbNu2DVVVVYhEIojH40oDuiMwHUAb33eynM91gekAVsla6hs3bkR7e/tn\n3t7W1obu7m50dXXhpZdewpYtW5QGJCKiqcta6o899hgeeuihz7y9tbUVGzZsAADU1dVhaGgIg4OD\n6hI6Q5gOoI3vO1nO5zphOoBVHnin3t/fj7KysvS5tLQUfX19D3q3RER0H/JV3ImUctw5FApNel1T\nUxMqKioAAMXFxYhGo+l939ijCZfOPT09AGbfM6HA3f2eGH1//+eRkVt379ngvEEQWPH7zfn8n++u\nsXMwhXOQw/XQml/FWQiBw4cPA0C6L3MRkp9u5En09vZi9erVePvttyfc9uMf/xhBEGDdunUAgAUL\nFuCNN95AOBwe/4lCoQnl77qWlhY899wwUqkWLfdfVFSOROI0ysvLtdw/kW0+eUCosyfc66Fcu/OB\n1y+NjY04cuQIAKCzsxPFxcUTCv2LQZgOoM3ER1F+4XyuE6YDWCXr+uXpp5/GG2+8gRs3bqCsrAw/\n//nPcefOHQBAc3MzGhoa0NbWhsrKShQVFeHQoUPaQxMR0eSmtH5R8om4fskZ1y/0RcP1y0Sf+/qF\niIjswVJXRpgOoI3vO1nO5zphOoBVWOpERB5hqSsTmA6gzdhzaX3F+VwXmA5gFZY6EZFHWOrKCNMB\ntPF9J8v5XCdMB7AKS52IyCMsdWUC0wG08X0ny/lcF5gOYBWWOhGRR1jqygjTAbTxfSfL+VwnTAew\nCkudiMgjLHVlAtMBtPF9J8v5XBeYDmAVljoRkUdY6soI0wG08X0ny/lcJ0wHsApLnYjIIyx1ZQLT\nAbTxfSfL+VwXmA5gFZY6EZFHWOrKCNMBtPF9J8v5XCdMB7AKS52IyCMsdWUC0wG08X0ny/lcF5gO\nYBWWOhGRR1jqygjTAbTxfSfL+VwnTAewCkudiMgjLHVlAtMBtPF9J8v5XBeYDmAVljoRkUdY6soI\n0wG08X0ny/lcJ0wHsApLnYjIIyx1ZQLTAbTxfSfL+VwXmA5glayl3t7ejgULFqCqqgovvPDChNuF\nEJg5cyZisRhisRh27typJSgREWWXsdRHRkbw7LPPor29HYlEAkePHsWlS5cmXLd8+XLE43HE43H8\n7Gc/0xbWbsJ0AG1838lyPtcJ0wGskrHUz58/j8rKSlRUVKCgoADr1q3Da6+9NuE6KaW2gERENHUZ\nS72/vx9lZWXpc2lpKfr7+8ddEwqFcPbsWUQiETQ0NCCRSOhJar3AdABtfN/Jcj7XBaYDWCU/042h\nUCjrHSxZsgTJZBKFhYU4efIk1qxZg8uXLysLSEREU5ex1EtKSpBMJtPnZDKJ0tLScdfMmDEj/d9P\nPPEEtm7dips3b2LWrFkT7q+pqQkVFRUAgOLiYkSj0fSjiLG9n0vnnp4eALNHp/sVgCjuPmoQo+/v\n/zwycgtjTM57707Wpt9/zufffHeNnYMpnO/9tdmuh9b8qv5/HT58GADSfZkTmcGdO3fk17/+dXnl\nyhX58ccfy0gkIhOJxLhrBgYGZCqVklJKee7cOfnwww9Pel9ZPpWTdu3aJfPydkhASqBj9L26t6Ki\nMnn16lXTY8qOjg7TEbTifPYAcB9fK7l87bnXQ7lmzvhIPT8/HwcOHMCqVaswMjKCzZs3o7q6GgcP\nHgQANDc348SJE3jxxReRn5+PwsJCHDt2LPe/WbwQmA6gzdijCV9xPtcFpgNYJTT6N4H+TxQKefcs\nmZaWFjz33DBSqRYt919UVI5E4jTKy8u13D+RbT75OZ7OnnCvh3LtTr6iVBlhOoA2E/edfuF8rhOm\nA1iFpU5E5BGWujKB6QDa+L6T5XyuC0wHsApLnYjIIyx1ZYTpANr4vpPlfK4TpgNYhaVOROQRlroy\ngekA2vi+k+V8rgtMB7AKS52IyCMsdWWE6QDa+L6T5XyuE6YDWIWlTkTkEZa6MoHpANr4vpPlfK4L\nTAewCkudiMgjLHVlhOkA2vi+k+V8rhOmA1iFpU5E5BGWujKB6QDa+L6T5XyuC0wHsApLnYjIIyx1\nZYTpANr4vpPlfK4TpgNYhaVOROQRlroygekA2vi+k+V8rgtMB7AKS52IyCMsdWWE6QDa+L6T5Xyu\nE6YDWIWlTkTkEZa6MoHpANr4vpPlfK4LTAewCkudiMgjLHVlhOkA2vi+k+V8rhOmA1iFpU5E5BGW\nujKB6QDa+L6T5XyuC0wHsApLnYjII1lLvb29HQsWLEBVVRVeeOGFSa/Ztm0bqqqqEIlEEI/HlYd0\ngzAdQBvfd7Kcz3XCdACrZCz1kZERPPvss2hvb0cikcDRo0dx6dKlcde0tbWhu7sbXV1deOmll7Bl\nyxatge110XQAbS5e9Hc2gPO5z/f5cpOx1M+fP4/KykpUVFSgoKAA69atw2uvvTbumtbWVmzYsAEA\nUFdXh6GhIQwODupLbK0h0wG0GRrydzaA87nP9/lyk7HU+/v7UVZWlj6Xlpaiv78/6zV9fX2KYxIR\n0VTkZ7oxFApN6U6klPf161yXl5eHgoL/w5e//DY+/DCOwsILSu//ww/fQ16e+Z9l9/b2mo6gFedz\nXa/pAFbJWOolJSVIJpPpczKZRGlpacZr+vr6UFJSMuG+5s2b523Zf/zxOwCA99/vz3Jl7u79Lsik\nV155xXQErTifTe6nJ6Y+n2s9NG/evJyuz1jqS5cuRVdXF3p7e/HVr34Vx48fx9GjR8dd09jYiAMH\nDmDdunXo7OxEcXExwuHwhPvq7u7OKRgREeUuY6nn5+fjwIEDWLVqFUZGRrB582ZUV1fj4MGDAIDm\n5mY0NDSgra0NlZWVKCoqwqFDhz6X4ERENFFIfnohTkREztL6U7jf/e53WLhwIb70pS/hzTffHHfb\n7t27UVVVhQULFuD111/XGUOrqbw4yyWbNm1COBxGbW1t+mM3b95EfX09HnnkETz++ONOP0UumUxi\nxYoVWLhwIRYtWoT9+/cD8GPGW7duoa6uDtFoFDU1NdixYwcAP2a718jICGKxGFavXg3Ar/kqKiqw\nePFixGIxfOMb3wCQ+3xaS722thZ/+MMf8K1vfWvcxxOJBI4fP45EIoH29nZs3boVqVRKZxQtpvLi\nLNds3LgR7e3t4z62Z88e1NfX4/Lly1i5ciX27NljKN2DKygowN69e/GPf/wDnZ2d+PWvf41Lly55\nMeO0adPQ0dGBixcv4q233kJHRwdOnz7txWz32rdvH2pqatI/8PRpvlAoBCEE4vE4zp8/D+A+5pOf\ngyAI5IULF9LnlpYWuWfPnvR51apV8q9//evnEUWps2fPylWrVqXPu3fvlrt37zaYSI0rV67IRYsW\npc/z58+XAwMDUkopr1+/LufPn28qmnLf//735alTp7yb8YMPPpBLly6Vf//7372aLZlMypUrV8o/\n/elP8nvf+56U0q8/nxUVFfLGjRvjPpbrfEaeBH3t2rVxT42c7EVNLpjKi7N8MDg4mH5GUzgc9uYV\nw729vYjH46irq/NmxlQqhWg0inA4nF4z+TIbAPz0pz/FL3/5y3Gv3/BpvlAohO985ztYunQpfvOb\n3wDIfb6Mz36Zivr6egwMDEz4eEtLS3rnNRWuPXcUcDPzgwqFQl7MPTw8jLVr12Lfvn2YMWPGuNtc\nnjEvLw8XL17Ef/7zH6xatQodHR3jbnd5tj/+8Y+YPXs2YrHYZ/4jZS7PBwBnzpzB3Llz8d5776G+\nvh4LFiwYd/tU5nvgUj916lTOv2aqL1iy3VRenOWDcDiMgYEBzJkzB9evX8fs2bNNR3ogd+7cwdq1\na7F+/XqsWbMGgH8zzpw5E08++SQuXLjgzWxnz55Fa2sr2tracOvWLbz//vtYv369N/MBwNy5cwEA\nX/nKV/CDH/wA58+fz3m+z239Iu955mRjYyOOHTuG27dv48qVK+jq6kr/pNcl97446/bt2zh+/Dga\nGxtNx1KusbEx/YrEV155JV2ELpJSYvPmzaipqcH27dvTH/dhxhs3bqSfGfHRRx/h1KlTiMViXswG\nfPLdfzKZxJUrV3Ds2DF8+9vfxm9/+1tv5vvwww/x3//+FwDwwQcf4PXXX0dtbW3u82na90sppfz9\n738vS0tL5bRp02Q4HJbf/e5307ft2rVLzps3T86fP1+2t7frjKFVW1ubfOSRR+S8efNkS0uL6TgP\nbN26dXLu3LmyoKBAlpaWypdffln+61//kitXrpRVVVWyvr5e/vvf/zYd87795S9/kaFQSEYiERmN\nRmU0GpUnT570Ysa33npLxmIxGYlEZG1trfzFL34hpZRezPZpQgi5evVqKaU/8/X09MhIJCIjkYhc\nuHBhuk9ynY8vPiIi8oj5fwKQiIiUYakTEXmEpU5E5BGWOhGRR1jqREQeYakTEXmEpU5E5BGWOhGR\nR/4ftWgwQ400YEMAAAAASUVORK5CYII=\n",
"text": [
"<matplotlib.figure.Figure at 0x10c6ff110>"
]
}
],
"prompt_number": 38
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reading and writing\n",
"\n",
"`csv`, `excel`, `hdf`, `sql`, `json`, `html`, `stata`, `clipboard`, `pickle`,\n",
"and experimental (as of Pandas 0.13.1): `msgpack`, `gbq`.\n",
"\n",
"And of course typical Python objects, like `list`, `dict` or `numpy.array`.\n",
"\n",
"http://pandas.pydata.org/pandas-docs/stable/io.html"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df = pd.read_csv(\"gimn_egz_2002do2013.csv\",\n",
" encoding='utf8')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 39
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you need it, download from:\n",
"https://github.com/stared/szkolomat_dane/ (direct link is [here](https://raw.githubusercontent.com/stared/szkolomat_dane/master/sp_spr_2002do2013.csv))."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"len(df)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 40,
"text": [
"181463"
]
}
],
"prompt_number": 40
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df.head()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id_szkoly</th>\n",
" <th>egz_norm_sr</th>\n",
" <th>egz_norm_std</th>\n",
" <th>probka</th>\n",
" <th>rok</th>\n",
" <th>czesc</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 13997</td>\n",
" <td> 87.235615</td>\n",
" <td> 10.182524</td>\n",
" <td> 13</td>\n",
" <td> 2002</td>\n",
" <td> gh</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 13999</td>\n",
" <td> 91.391710</td>\n",
" <td> NaN</td>\n",
" <td> 1</td>\n",
" <td> 2002</td>\n",
" <td> gh</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 14002</td>\n",
" <td> 101.964605</td>\n",
" <td> 13.317307</td>\n",
" <td> 189</td>\n",
" <td> 2002</td>\n",
" <td> gh</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 14004</td>\n",
" <td> 100.250970</td>\n",
" <td> 14.395403</td>\n",
" <td> 70</td>\n",
" <td> 2002</td>\n",
" <td> gh</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 14005</td>\n",
" <td> 95.465265</td>\n",
" <td> 13.085892</td>\n",
" <td> 62</td>\n",
" <td> 2002</td>\n",
" <td> gh</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows \u00d7 6 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 41,
"text": [
" id_szkoly egz_norm_sr egz_norm_std probka rok czesc\n",
"0 13997 87.235615 10.182524 13 2002 gh\n",
"1 13999 91.391710 NaN 1 2002 gh\n",
"2 14002 101.964605 13.317307 189 2002 gh\n",
"3 14004 100.250970 14.395403 70 2002 gh\n",
"4 14005 95.465265 13.085892 62 2002 gh\n",
"\n",
"[5 rows x 6 columns]"
]
}
],
"prompt_number": 41
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df = df.rename(columns={'id_szkoly': 'school_id',\n",
" 'egz_norm_sr': 'mean_score',\n",
" 'egz_norm_std': 'std_dev',\n",
" 'probka': 'no_of_students',\n",
" 'rok': 'year',\n",
" 'czesc': 'exam_part'})"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 42
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df.head()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>school_id</th>\n",
" <th>mean_score</th>\n",
" <th>std_dev</th>\n",
" <th>no_of_students</th>\n",
" <th>year</th>\n",
" <th>exam_part</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 13997</td>\n",
" <td> 87.235615</td>\n",
" <td> 10.182524</td>\n",
" <td> 13</td>\n",
" <td> 2002</td>\n",
" <td> gh</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 13999</td>\n",
" <td> 91.391710</td>\n",
" <td> NaN</td>\n",
" <td> 1</td>\n",
" <td> 2002</td>\n",
" <td> gh</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 14002</td>\n",
" <td> 101.964605</td>\n",
" <td> 13.317307</td>\n",
" <td> 189</td>\n",
" <td> 2002</td>\n",
" <td> gh</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 14004</td>\n",
" <td> 100.250970</td>\n",
" <td> 14.395403</td>\n",
" <td> 70</td>\n",
" <td> 2002</td>\n",
" <td> gh</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 14005</td>\n",
" <td> 95.465265</td>\n",
" <td> 13.085892</td>\n",
" <td> 62</td>\n",
" <td> 2002</td>\n",
" <td> gh</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows \u00d7 6 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 43,
"text": [
" school_id mean_score std_dev no_of_students year exam_part\n",
"0 13997 87.235615 10.182524 13 2002 gh\n",
"1 13999 91.391710 NaN 1 2002 gh\n",
"2 14002 101.964605 13.317307 189 2002 gh\n",
"3 14004 100.250970 14.395403 70 2002 gh\n",
"4 14005 95.465265 13.085892 62 2002 gh\n",
"\n",
"[5 rows x 6 columns]"
]
}
],
"prompt_number": 43
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df['year'].value_counts()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 44,
"text": [
"2013 27198\n",
"2012 27127\n",
"2011 13460\n",
"2010 13311\n",
"2009 13206\n",
"2008 13110\n",
"2007 13008\n",
"2006 12875\n",
"2005 12629\n",
"2004 12455\n",
"2003 11720\n",
"2002 11364\n",
"dtype: int64"
]
}
],
"prompt_number": 44
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df['exam_part'].value_counts()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 45,
"text": [
"gh 63600\n",
"gm 63538\n",
"gh_h 13582\n",
"gh_p 13582\n",
"gm_m 13581\n",
"gm_p 13580\n",
"dtype: int64"
]
}
],
"prompt_number": 45
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[df['year'] == 2013].describe()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>school_id</th>\n",
" <th>mean_score</th>\n",
" <th>std_dev</th>\n",
" <th>no_of_students</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td> 27198.000000</td>\n",
" <td> 27198.000000</td>\n",
" <td> 27125.000000</td>\n",
" <td> 27198.000000</td>\n",
" <td> 27198</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td> 18876.419590</td>\n",
" <td> 99.090552</td>\n",
" <td> 13.070827</td>\n",
" <td> 55.841385</td>\n",
" <td> 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td> 7614.400545</td>\n",
" <td> 7.748602</td>\n",
" <td> 2.278386</td>\n",
" <td> 42.845946</td>\n",
" <td> 0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td> 14034.000000</td>\n",
" <td> 44.116249</td>\n",
" <td> 0.000000</td>\n",
" <td> 1.000000</td>\n",
" <td> 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td> 15985.250000</td>\n",
" <td> 95.542765</td>\n",
" <td> 11.788384</td>\n",
" <td> 24.000000</td>\n",
" <td> 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td> 17903.000000</td>\n",
" <td> 99.100521</td>\n",
" <td> 13.214976</td>\n",
" <td> 44.000000</td>\n",
" <td> 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td> 19822.750000</td>\n",
" <td> 102.816036</td>\n",
" <td> 14.487819</td>\n",
" <td> 74.000000</td>\n",
" <td> 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td> 76022.000000</td>\n",
" <td> 136.636502</td>\n",
" <td> 34.353405</td>\n",
" <td> 324.000000</td>\n",
" <td> 2013</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>8 rows \u00d7 5 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 46,
"text": [
" school_id mean_score std_dev no_of_students year\n",
"count 27198.000000 27198.000000 27125.000000 27198.000000 27198\n",
"mean 18876.419590 99.090552 13.070827 55.841385 2013\n",
"std 7614.400545 7.748602 2.278386 42.845946 0\n",
"min 14034.000000 44.116249 0.000000 1.000000 2013\n",
"25% 15985.250000 95.542765 11.788384 24.000000 2013\n",
"50% 17903.000000 99.100521 13.214976 44.000000 2013\n",
"75% 19822.750000 102.816036 14.487819 74.000000 2013\n",
"max 76022.000000 136.636502 34.353405 324.000000 2013\n",
"\n",
"[8 rows x 5 columns]"
]
}
],
"prompt_number": 46
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[df['year'] == 2013].mean_score.apply(type).value_counts()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 47,
"text": [
"<type 'numpy.float64'> 27198\n",
"dtype: int64"
]
}
],
"prompt_number": 47
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"best_of_2013 = df[(df['year'] == 2013) & (df['exam_part'] == 'gm_p')]\n",
"best_of_2013 = best_of_2013[best_of_2013['mean_score'] > 120.]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 48
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"best_of_2013.head().values"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 49,
"text": [
"array([[14323, 120.39388635816549, 9.856241445653913, 59, 2013, u'gm_p'],\n",
" [14459, 120.13272510022158, 12.604354608903089, 34, 2013, u'gm_p'],\n",
" [14479, 125.75772057006449, 9.302966477302826, 96, 2013, u'gm_p'],\n",
" [14489, 120.9617441796834, 12.391730327041405, 7, 2013, u'gm_p'],\n",
" [14490, 120.40927031104006, 10.972039172309728, 37, 2013, u'gm_p']], dtype=object)"
]
}
],
"prompt_number": 49
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"best_of_2013.to_csv(\"best_of_2013.csv\")"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 50
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df.groupby('school_id').mean()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>mean_score</th>\n",
" <th>std_dev</th>\n",
" <th>no_of_students</th>\n",
" <th>year</th>\n",
" </tr>\n",
" <tr>\n",
" <th>school_id</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>13997</th>\n",
" <td> 87.168595</td>\n",
" <td> 9.749717</td>\n",
" <td> 16.000000</td>\n",
" <td> 2005.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13999</th>\n",
" <td> 92.727838</td>\n",
" <td> NaN</td>\n",
" <td> 1.000000</td>\n",
" <td> 2002.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14000</th>\n",
" <td> 75.605080</td>\n",
" <td> 13.670722</td>\n",
" <td> 8.333333</td>\n",
" <td> 2009.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14001</th>\n",
" <td> 89.750454</td>\n",
" <td> 6.597126</td>\n",
" <td> 1.500000</td>\n",
" <td> 2009.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14002</th>\n",
" <td> 99.193081</td>\n",
" <td> 14.471929</td>\n",
" <td> 142.250000</td>\n",
" <td> 2004.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14003</th>\n",
" <td> 99.526737</td>\n",
" <td> 13.239124</td>\n",
" <td> 85.000000</td>\n",
" <td> 2005.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14004</th>\n",
" <td> 98.468027</td>\n",
" <td> 12.725488</td>\n",
" <td> 55.500000</td>\n",
" <td> 2004.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14005</th>\n",
" <td> 94.621077</td>\n",
" <td> 13.925231</td>\n",
" <td> 58.000000</td>\n",
" <td> 2003.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14006</th>\n",
" <td> 93.633615</td>\n",
" <td> 11.154024</td>\n",
" <td> 63.500000</td>\n",
" <td> 2003.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14007</th>\n",
" <td> 97.667551</td>\n",
" <td> 13.371636</td>\n",
" <td> 39.833333</td>\n",
" <td> 2004.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14008</th>\n",
" <td> 94.748298</td>\n",
" <td> 14.850266</td>\n",
" <td> 62.000000</td>\n",
" <td> 2005.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14009</th>\n",
" <td> 97.335682</td>\n",
" <td> 10.911686</td>\n",
" <td> 10.000000</td>\n",
" <td> 2004.333333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14010</th>\n",
" <td> 95.231192</td>\n",
" <td> 8.960401</td>\n",
" <td> 10.000000</td>\n",
" <td> 2004.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14011</th>\n",
" <td> 100.356619</td>\n",
" <td> 19.207342</td>\n",
" <td> 17.000000</td>\n",
" <td> 2005.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14012</th>\n",
" <td> 80.561827</td>\n",
" <td> 10.335914</td>\n",
" <td> 18.000000</td>\n",
" <td> 2007.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14013</th>\n",
" <td> 98.086374</td>\n",
" <td> 12.546801</td>\n",
" <td> 38.357143</td>\n",
" <td> 2005.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14014</th>\n",
" <td> 97.783984</td>\n",
" <td> 15.183853</td>\n",
" <td> 48.500000</td>\n",
" <td> 2003.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14015</th>\n",
" <td> 97.300123</td>\n",
" <td> 14.128982</td>\n",
" <td> 116.642857</td>\n",
" <td> 2005.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14016</th>\n",
" <td> 94.530170</td>\n",
" <td> 14.071069</td>\n",
" <td> 118.400000</td>\n",
" <td> 2004.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14017</th>\n",
" <td> 91.811632</td>\n",
" <td> 12.742623</td>\n",
" <td> 76.666667</td>\n",
" <td> 2004.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14018</th>\n",
" <td> 101.850389</td>\n",
" <td> 16.012857</td>\n",
" <td> 40.250000</td>\n",
" <td> 2003.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14019</th>\n",
" <td> 95.985254</td>\n",
" <td> 12.897286</td>\n",
" <td> 34.571429</td>\n",
" <td> 2005.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14020</th>\n",
" <td> 73.859160</td>\n",
" <td> 9.628826</td>\n",
" <td> 31.333333</td>\n",
" <td> 2005.333333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14021</th>\n",
" <td> 110.169410</td>\n",
" <td> 11.434753</td>\n",
" <td> 76.000000</td>\n",
" <td> 2006.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14022</th>\n",
" <td> 92.357969</td>\n",
" <td> 13.179575</td>\n",
" <td> 102.416667</td>\n",
" <td> 2004.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14024</th>\n",
" <td> 97.928798</td>\n",
" <td> 11.978615</td>\n",
" <td> 52.800000</td>\n",
" <td> 2004.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14025</th>\n",
" <td> 107.622472</td>\n",
" <td> 12.393003</td>\n",
" <td> 63.400000</td>\n",
" <td> 2004.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14026</th>\n",
" <td> 108.361917</td>\n",
" <td> 9.744963</td>\n",
" <td> 23.000000</td>\n",
" <td> 2005.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14027</th>\n",
" <td> 105.111340</td>\n",
" <td> 13.344660</td>\n",
" <td> 79.642857</td>\n",
" <td> 2005.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14028</th>\n",
" <td> 101.428645</td>\n",
" <td> 13.594276</td>\n",
" <td> 81.000000</td>\n",
" <td> 2005.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14029</th>\n",
" <td> 107.236395</td>\n",
" <td> 13.434021</td>\n",
" <td> 107.357143</td>\n",
" <td> 2005.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14030</th>\n",
" <td> 82.207932</td>\n",
" <td> 12.059516</td>\n",
" <td> 5.500000</td>\n",
" <td> 2004.833333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14031</th>\n",
" <td> 80.166864</td>\n",
" <td> 10.897871</td>\n",
" <td> 20.000000</td>\n",
" <td> 2005.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14032</th>\n",
" <td> 97.507137</td>\n",
" <td> 14.603907</td>\n",
" <td> 34.500000</td>\n",
" <td> 2003.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14033</th>\n",
" <td> 97.216445</td>\n",
" <td> 13.810932</td>\n",
" <td> 45.888889</td>\n",
" <td> 2006.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14034</th>\n",
" <td> 77.809420</td>\n",
" <td> 8.784643</td>\n",
" <td> 9.576923</td>\n",
" <td> 2008.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14035</th>\n",
" <td> 74.037372</td>\n",
" <td> 8.453088</td>\n",
" <td> 21.000000</td>\n",
" <td> 2008.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14036</th>\n",
" <td> 79.874904</td>\n",
" <td> 9.662012</td>\n",
" <td> 8.416667</td>\n",
" <td> 2007.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14037</th>\n",
" <td> 112.491791</td>\n",
" <td> 5.928938</td>\n",
" <td> 8.000000</td>\n",
" <td> 2010.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14038</th>\n",
" <td> 98.139344</td>\n",
" <td> 12.469259</td>\n",
" <td> 16.000000</td>\n",
" <td> 2009.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14039</th>\n",
" <td> 103.330678</td>\n",
" <td> 14.218774</td>\n",
" <td> 55.444444</td>\n",
" <td> 2010.555556</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14040</th>\n",
" <td> 79.051284</td>\n",
" <td> 11.240789</td>\n",
" <td> 19.500000</td>\n",
" <td> 2012.200000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14041</th>\n",
" <td> 108.315297</td>\n",
" <td> 11.297068</td>\n",
" <td> 17.555556</td>\n",
" <td> 2010.555556</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14042</th>\n",
" <td> 100.826484</td>\n",
" <td> 14.875846</td>\n",
" <td> 239.071429</td>\n",
" <td> 2008.214286</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14043</th>\n",
" <td> 98.743504</td>\n",
" <td> 14.350353</td>\n",
" <td> 202.571429</td>\n",
" <td> 2008.214286</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14044</th>\n",
" <td> 96.488546</td>\n",
" <td> 13.571148</td>\n",
" <td> 125.214286</td>\n",
" <td> 2008.214286</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14045</th>\n",
" <td> 115.766330</td>\n",
" <td> 10.899575</td>\n",
" <td> 21.107143</td>\n",
" <td> 2008.214286</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14046</th>\n",
" <td> 92.666831</td>\n",
" <td> 4.827525</td>\n",
" <td> 2.000000</td>\n",
" <td> 2013.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14047</th>\n",
" <td> 96.600733</td>\n",
" <td> 14.101432</td>\n",
" <td> 74.714286</td>\n",
" <td> 2008.214286</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14048</th>\n",
" <td> 98.506472</td>\n",
" <td> 13.873354</td>\n",
" <td> 137.142857</td>\n",
" <td> 2008.214286</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14049</th>\n",
" <td> 93.349107</td>\n",
" <td> 13.056765</td>\n",
" <td> 43.714286</td>\n",
" <td> 2008.214286</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14050</th>\n",
" <td> 96.070019</td>\n",
" <td> 13.335813</td>\n",
" <td> 35.071429</td>\n",
" <td> 2008.214286</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14051</th>\n",
" <td> 98.398353</td>\n",
" <td> 13.541373</td>\n",
" <td> 68.642857</td>\n",
" <td> 2008.214286</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14052</th>\n",
" <td> 100.590638</td>\n",
" <td> 12.995011</td>\n",
" <td> 38.785714</td>\n",
" <td> 2008.214286</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14053</th>\n",
" <td> 93.900236</td>\n",
" <td> 14.552012</td>\n",
" <td> 101.357143</td>\n",
" <td> 2008.214286</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14054</th>\n",
" <td> 103.803544</td>\n",
" <td> 13.133332</td>\n",
" <td> 146.285714</td>\n",
" <td> 2008.214286</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14055</th>\n",
" <td> 94.186538</td>\n",
" <td> 13.612800</td>\n",
" <td> 85.214286</td>\n",
" <td> 2008.214286</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14057</th>\n",
" <td> 98.472108</td>\n",
" <td> 13.201418</td>\n",
" <td> 77.821429</td>\n",
" <td> 2008.214286</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14058</th>\n",
" <td> 105.893750</td>\n",
" <td> 12.430496</td>\n",
" <td> 10.392857</td>\n",
" <td> 2008.214286</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14059</th>\n",
" <td> 96.923023</td>\n",
" <td> 12.644721</td>\n",
" <td> 36.000000</td>\n",
" <td> 2011.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th></th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>7853 rows \u00d7 4 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 51,
"text": [
" mean_score std_dev no_of_students year\n",
"school_id \n",
"13997 87.168595 9.749717 16.000000 2005.000000\n",
"13999 92.727838 NaN 1.000000 2002.000000\n",
"14000 75.605080 13.670722 8.333333 2009.000000\n",
"14001 89.750454 6.597126 1.500000 2009.000000\n",
"14002 99.193081 14.471929 142.250000 2004.500000\n",
"14003 99.526737 13.239124 85.000000 2005.000000\n",
"14004 98.468027 12.725488 55.500000 2004.500000\n",
"14005 94.621077 13.925231 58.000000 2003.500000\n",
"14006 93.633615 11.154024 63.500000 2003.500000\n",
"14007 97.667551 13.371636 39.833333 2004.500000\n",
"14008 94.748298 14.850266 62.000000 2005.000000\n",
"14009 97.335682 10.911686 10.000000 2004.333333\n",
"14010 95.231192 8.960401 10.000000 2004.500000\n",
"14011 100.356619 19.207342 17.000000 2005.000000\n",
"14012 80.561827 10.335914 18.000000 2007.000000\n",
"14013 98.086374 12.546801 38.357143 2005.000000\n",
"14014 97.783984 15.183853 48.500000 2003.500000\n",
"14015 97.300123 14.128982 116.642857 2005.000000\n",
"14016 94.530170 14.071069 118.400000 2004.000000\n",
"14017 91.811632 12.742623 76.666667 2004.500000\n",
"14018 101.850389 16.012857 40.250000 2003.500000\n",
"14019 95.985254 12.897286 34.571429 2005.000000\n",
"14020 73.859160 9.628826 31.333333 2005.333333\n",
"14021 110.169410 11.434753 76.000000 2006.500000\n",
"14022 92.357969 13.179575 102.416667 2004.500000\n",
"14024 97.928798 11.978615 52.800000 2004.000000\n",
"14025 107.622472 12.393003 63.400000 2004.000000\n",
"14026 108.361917 9.744963 23.000000 2005.000000\n",
"14027 105.111340 13.344660 79.642857 2005.000000\n",
"14028 101.428645 13.594276 81.000000 2005.000000\n",
"14029 107.236395 13.434021 107.357143 2005.000000\n",
"14030 82.207932 12.059516 5.500000 2004.833333\n",
"14031 80.166864 10.897871 20.000000 2005.000000\n",
"14032 97.507137 14.603907 34.500000 2003.500000\n",
"14033 97.216445 13.810932 45.888889 2006.000000\n",
"14034 77.809420 8.784643 9.576923 2008.000000\n",
"14035 74.037372 8.453088 21.000000 2008.000000\n",
"14036 79.874904 9.662012 8.416667 2007.500000\n",
"14037 112.491791 5.928938 8.000000 2010.000000\n",
"14038 98.139344 12.469259 16.000000 2009.500000\n",
"14039 103.330678 14.218774 55.444444 2010.555556\n",
"14040 79.051284 11.240789 19.500000 2012.200000\n",
"14041 108.315297 11.297068 17.555556 2010.555556\n",
"14042 100.826484 14.875846 239.071429 2008.214286\n",
"14043 98.743504 14.350353 202.571429 2008.214286\n",
"14044 96.488546 13.571148 125.214286 2008.214286\n",
"14045 115.766330 10.899575 21.107143 2008.214286\n",
"14046 92.666831 4.827525 2.000000 2013.000000\n",
"14047 96.600733 14.101432 74.714286 2008.214286\n",
"14048 98.506472 13.873354 137.142857 2008.214286\n",
"14049 93.349107 13.056765 43.714286 2008.214286\n",
"14050 96.070019 13.335813 35.071429 2008.214286\n",
"14051 98.398353 13.541373 68.642857 2008.214286\n",
"14052 100.590638 12.995011 38.785714 2008.214286\n",
"14053 93.900236 14.552012 101.357143 2008.214286\n",
"14054 103.803544 13.133332 146.285714 2008.214286\n",
"14055 94.186538 13.612800 85.214286 2008.214286\n",
"14057 98.472108 13.201418 77.821429 2008.214286\n",
"14058 105.893750 12.430496 10.392857 2008.214286\n",
"14059 96.923023 12.644721 36.000000 2011.000000\n",
" ... ... ... ...\n",
"\n",
"[7853 rows x 4 columns]"
]
}
],
"prompt_number": 51
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Comments\n",
"\n",
"When importing data, be careful to remember to distinguish `str`/`uniode` from `int`.\n",
"\n",
"E.g. `JSON` format can have only strings as keys, and it is easy to get nasty bug.\n",
"\n",
"Big data warning: if it fits your memory, you are better off doing it locally that doing.\n",
"\n",
"I also recommend [this tutorial on Pandas](http://manishamde.github.io/blog/2013/03/07/pandas-and-python-top-10/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## by Piotr Migda\u0142\n",
"\n",
"finishing PhD in quantum optics at ICFO - The Institute of Photonic Sciences, Castelldefels (Barcelona) \n",
"\n",
"pmigdal@gmail.com, http://migdal.wikidot.com\n",
"\n",
"I freelance in **data analysis** (Python!) and **interactive visualizations** (D3.js). See examples:\n",
"\n",
"* [Maps of concepts: programming and not only!](https://github.com/stared/tag-graph-map-of-stackexchange/wiki)\n",
"* [Book themes visualized](http://stared.github.io/wizualizacja-wolnych-lektur/polish_books_themes.html)\n",
"* [TagOverflow](http://stared.github.io/tagoverflow/)\n",
"* [Szko\u0142omat - schools in Poland](http://szkolomat.pl/) (in Polish; data analysis mostly in Pandas)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"BTW: If you do quantum stuff, check [QuTiP: Quantum Toolbox in Python](http://qutip.org/). "
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment