Created
May 11, 2014 19:06
-
-
Save stared/6dc90471f0154c697c8b to your computer and use it in GitHub Desktop.
Introduction to Pandas (from: The Barcelona Python Meetup Group: Python & Sciences, 24 April 2014)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"metadata": { | |
"name": "", | |
"signature": "sha256:59f5a4861925d4bd4b833a2681e4dbf2f891f12ff0631cea44a53f90e6b12999" | |
}, | |
"nbformat": 3, | |
"nbformat_minor": 0, | |
"worksheets": [ | |
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Introduction to Pandas \n", | |
"\n", | |
"* Piotr Migda\u0142, http://migdal.wikidot.com\n", | |
"* The Barcelona Python Meetup Group: [Python & Sciences](http://www.meetup.com/python-185/events/169870182/) (24 April 2014)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# [Pandas](http://pandas.pydata.org/)\n", | |
"\n", | |
"* 0.13.1 released (February 3, 2014)\n", | |
"\n", | |
" pandas is an open source, BSD-licensed library\n", | |
" providing high-performance, easy-to-use data structures\n", | |
" and data analysis tools for the Python programming language\n", | |
" \n", | |
"In practice, Pandas brings **R**-like data structures, great for working with **tabular data**. Tabular?\n", | |
"\n", | |
"* Everything that goes into **SQL**, **Excel** or **CSV**.\n", | |
"* Not all data, but data you typically work with.\n", | |
"\n", | |
"Both for data exploration and production.\n", | |
"\n", | |
"If you can do it with SQL or Excel, you can do it with Pandas!\n", | |
"\n", | |
"(Counterexamples?)\n", | |
"\n", | |
"So:\n", | |
"\n", | |
" $ pip install pandas" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"import pandas as pd" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 1 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Series\n", | |
"\n", | |
"* Creating\n", | |
"* Adding\n", | |
"* Concatenating\n", | |
"* Filtering\n", | |
"* Applying map" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser = pd.Series(['a','b','c', 'd', 'e'])" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 2 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 3, | |
"text": [ | |
"0 a\n", | |
"1 b\n", | |
"2 c\n", | |
"3 d\n", | |
"4 e\n", | |
"dtype: object" | |
] | |
} | |
], | |
"prompt_number": 3 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser[1] = 12" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 4 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser[9] = 'qqq'" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 5 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 6, | |
"text": [ | |
"0 a\n", | |
"1 12\n", | |
"2 c\n", | |
"3 d\n", | |
"4 e\n", | |
"9 qqq\n", | |
"dtype: object" | |
] | |
} | |
], | |
"prompt_number": 6 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser['ten'] = 'abc'" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 7 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 8, | |
"text": [ | |
"0 a\n", | |
"1 12\n", | |
"2 c\n", | |
"3 d\n", | |
"4 e\n", | |
"9 qqq\n", | |
"ten abc\n", | |
"dtype: object" | |
] | |
} | |
], | |
"prompt_number": 8 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# ser.loc[k] accesses object with index k (can be integer, string or timestamp)\n", | |
"ser.loc['ten']" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 9, | |
"text": [ | |
"'abc'" | |
] | |
} | |
], | |
"prompt_number": 9 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# ser.iloc[i] - accessed i-th object; i in range(0, len(ser))\n", | |
"ser.iloc[4]" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 10, | |
"text": [ | |
"'e'" | |
] | |
} | |
], | |
"prompt_number": 10 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser != 'c'" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 11, | |
"text": [ | |
"0 True\n", | |
"1 True\n", | |
"2 False\n", | |
"3 True\n", | |
"4 True\n", | |
"9 True\n", | |
"ten True\n", | |
"dtype: bool" | |
] | |
} | |
], | |
"prompt_number": 11 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser[ser != 'c']" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 12, | |
"text": [ | |
"0 a\n", | |
"1 12\n", | |
"3 d\n", | |
"4 e\n", | |
"9 qqq\n", | |
"ten abc\n", | |
"dtype: object" | |
] | |
} | |
], | |
"prompt_number": 12 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser = ser.append(pd.Series([8,2,1],\n", | |
" index=[10,100,1000]))" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 13 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 14, | |
"text": [ | |
"0 a\n", | |
"1 12\n", | |
"2 c\n", | |
"3 d\n", | |
"4 e\n", | |
"9 qqq\n", | |
"ten abc\n", | |
"10 8\n", | |
"100 2\n", | |
"1000 1\n", | |
"dtype: object" | |
] | |
} | |
], | |
"prompt_number": 14 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"ser.describe()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 15, | |
"text": [ | |
"count 10\n", | |
"unique 10\n", | |
"top a\n", | |
"freq 1\n", | |
"dtype: object" | |
] | |
} | |
], | |
"prompt_number": 15 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Data frame\n", | |
"\n", | |
"* Creating\n", | |
"* Relations to Series; columns and index\n", | |
"* columns, loc, iloc, ix\n", | |
"* Iterating\n", | |
"* Filtering\n", | |
"* Joins\n", | |
"* Groupby\n", | |
"* Apply, axes\n", | |
"* Renaming things\n", | |
"* Dealings with NaNs\n", | |
"* Data types: (numpy) float, int, obj" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"import numpy as np" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 16 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df = pd.DataFrame(np.random.randn(6,4),\n", | |
" columns=[\"random\", \"guess\", \"chance\", \"luck\"])" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 17 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>random</th>\n", | |
" <th>guess</th>\n", | |
" <th>chance</th>\n", | |
" <th>luck</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>-0.820053</td>\n", | |
" <td> 0.952948</td>\n", | |
" <td> 2.062030</td>\n", | |
" <td> 1.399049</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td> 0.784917</td>\n", | |
" <td> 1.015215</td>\n", | |
" <td>-0.285176</td>\n", | |
" <td>-0.511790</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td> 0.079101</td>\n", | |
" <td>-1.468544</td>\n", | |
" <td>-0.047072</td>\n", | |
" <td>-0.450447</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td> 1.032954</td>\n", | |
" <td> 0.461783</td>\n", | |
" <td> 1.509796</td>\n", | |
" <td>-0.323996</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>-0.590753</td>\n", | |
" <td>-0.287149</td>\n", | |
" <td> 0.596965</td>\n", | |
" <td> 1.252726</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td> 0.884594</td>\n", | |
" <td>-0.125060</td>\n", | |
" <td>-1.458435</td>\n", | |
" <td> 1.159053</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>6 rows \u00d7 4 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 18, | |
"text": [ | |
" random guess chance luck\n", | |
"0 -0.820053 0.952948 2.062030 1.399049\n", | |
"1 0.784917 1.015215 -0.285176 -0.511790\n", | |
"2 0.079101 -1.468544 -0.047072 -0.450447\n", | |
"3 1.032954 0.461783 1.509796 -0.323996\n", | |
"4 -0.590753 -0.287149 0.596965 1.252726\n", | |
"5 0.884594 -0.125060 -1.458435 1.159053\n", | |
"\n", | |
"[6 rows x 4 columns]" | |
] | |
} | |
], | |
"prompt_number": 18 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df[\"luck\"]" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 19, | |
"text": [ | |
"0 1.399049\n", | |
"1 -0.511790\n", | |
"2 -0.450447\n", | |
"3 -0.323996\n", | |
"4 1.252726\n", | |
"5 1.159053\n", | |
"Name: luck, dtype: float64" | |
] | |
} | |
], | |
"prompt_number": 19 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df[\"luck\"] + 1" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 20, | |
"text": [ | |
"0 2.399049\n", | |
"1 0.488210\n", | |
"2 0.549553\n", | |
"3 0.676004\n", | |
"4 2.252726\n", | |
"5 2.159053\n", | |
"Name: luck, dtype: float64" | |
] | |
} | |
], | |
"prompt_number": 20 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df[\"fate\"] = 2 * df[\"luck\"] - df[\"random\"]" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 21 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>random</th>\n", | |
" <th>guess</th>\n", | |
" <th>chance</th>\n", | |
" <th>luck</th>\n", | |
" <th>fate</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>-0.820053</td>\n", | |
" <td> 0.952948</td>\n", | |
" <td> 2.062030</td>\n", | |
" <td> 1.399049</td>\n", | |
" <td> 3.618150</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td> 0.784917</td>\n", | |
" <td> 1.015215</td>\n", | |
" <td>-0.285176</td>\n", | |
" <td>-0.511790</td>\n", | |
" <td>-1.808498</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td> 0.079101</td>\n", | |
" <td>-1.468544</td>\n", | |
" <td>-0.047072</td>\n", | |
" <td>-0.450447</td>\n", | |
" <td>-0.979995</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td> 1.032954</td>\n", | |
" <td> 0.461783</td>\n", | |
" <td> 1.509796</td>\n", | |
" <td>-0.323996</td>\n", | |
" <td>-1.680946</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>-0.590753</td>\n", | |
" <td>-0.287149</td>\n", | |
" <td> 0.596965</td>\n", | |
" <td> 1.252726</td>\n", | |
" <td> 3.096204</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td> 0.884594</td>\n", | |
" <td>-0.125060</td>\n", | |
" <td>-1.458435</td>\n", | |
" <td> 1.159053</td>\n", | |
" <td> 1.433513</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>6 rows \u00d7 5 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 22, | |
"text": [ | |
" random guess chance luck fate\n", | |
"0 -0.820053 0.952948 2.062030 1.399049 3.618150\n", | |
"1 0.784917 1.015215 -0.285176 -0.511790 -1.808498\n", | |
"2 0.079101 -1.468544 -0.047072 -0.450447 -0.979995\n", | |
"3 1.032954 0.461783 1.509796 -0.323996 -1.680946\n", | |
"4 -0.590753 -0.287149 0.596965 1.252726 3.096204\n", | |
"5 0.884594 -0.125060 -1.458435 1.159053 1.433513\n", | |
"\n", | |
"[6 rows x 5 columns]" | |
] | |
} | |
], | |
"prompt_number": 22 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df[\"random\"] = df[\"random\"].apply(lambda x: 'a' if x > 0 else 'b')" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 23 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>random</th>\n", | |
" <th>guess</th>\n", | |
" <th>chance</th>\n", | |
" <th>luck</th>\n", | |
" <th>fate</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td> b</td>\n", | |
" <td> 0.952948</td>\n", | |
" <td> 2.062030</td>\n", | |
" <td> 1.399049</td>\n", | |
" <td> 3.618150</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td> a</td>\n", | |
" <td> 1.015215</td>\n", | |
" <td>-0.285176</td>\n", | |
" <td>-0.511790</td>\n", | |
" <td>-1.808498</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td> a</td>\n", | |
" <td>-1.468544</td>\n", | |
" <td>-0.047072</td>\n", | |
" <td>-0.450447</td>\n", | |
" <td>-0.979995</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td> a</td>\n", | |
" <td> 0.461783</td>\n", | |
" <td> 1.509796</td>\n", | |
" <td>-0.323996</td>\n", | |
" <td>-1.680946</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td> b</td>\n", | |
" <td>-0.287149</td>\n", | |
" <td> 0.596965</td>\n", | |
" <td> 1.252726</td>\n", | |
" <td> 3.096204</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td> a</td>\n", | |
" <td>-0.125060</td>\n", | |
" <td>-1.458435</td>\n", | |
" <td> 1.159053</td>\n", | |
" <td> 1.433513</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>6 rows \u00d7 5 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 24, | |
"text": [ | |
" random guess chance luck fate\n", | |
"0 b 0.952948 2.062030 1.399049 3.618150\n", | |
"1 a 1.015215 -0.285176 -0.511790 -1.808498\n", | |
"2 a -1.468544 -0.047072 -0.450447 -0.979995\n", | |
"3 a 0.461783 1.509796 -0.323996 -1.680946\n", | |
"4 b -0.287149 0.596965 1.252726 3.096204\n", | |
"5 a -0.125060 -1.458435 1.159053 1.433513\n", | |
"\n", | |
"[6 rows x 5 columns]" | |
] | |
} | |
], | |
"prompt_number": 24 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df.ix[1:3, \"guess\":\"luck\"]" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>guess</th>\n", | |
" <th>chance</th>\n", | |
" <th>luck</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td> 1.015215</td>\n", | |
" <td>-0.285176</td>\n", | |
" <td>-0.511790</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>-1.468544</td>\n", | |
" <td>-0.047072</td>\n", | |
" <td>-0.450447</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td> 0.461783</td>\n", | |
" <td> 1.509796</td>\n", | |
" <td>-0.323996</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>3 rows \u00d7 3 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 25, | |
"text": [ | |
" guess chance luck\n", | |
"1 1.015215 -0.285176 -0.511790\n", | |
"2 -1.468544 -0.047072 -0.450447\n", | |
"3 0.461783 1.509796 -0.323996\n", | |
"\n", | |
"[3 rows x 3 columns]" | |
] | |
} | |
], | |
"prompt_number": 25 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"a_column = pd.DataFrame(np.random.randn(3,1),\n", | |
" index=[1,4,9],\n", | |
" columns=[\"new\"])" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 26 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"a_column" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>new</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>-0.438710</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>-0.525838</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td>-1.167854</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>3 rows \u00d7 1 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 27, | |
"text": [ | |
" new\n", | |
"1 -0.438710\n", | |
"4 -0.525838\n", | |
"9 -1.167854\n", | |
"\n", | |
"[3 rows x 1 columns]" | |
] | |
} | |
], | |
"prompt_number": 27 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df = pd.concat([df, a_column], axis=1)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 28 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>random</th>\n", | |
" <th>guess</th>\n", | |
" <th>chance</th>\n", | |
" <th>luck</th>\n", | |
" <th>fate</th>\n", | |
" <th>new</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td> b</td>\n", | |
" <td> 0.952948</td>\n", | |
" <td> 2.062030</td>\n", | |
" <td> 1.399049</td>\n", | |
" <td> 3.618150</td>\n", | |
" <td> NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td> a</td>\n", | |
" <td> 1.015215</td>\n", | |
" <td>-0.285176</td>\n", | |
" <td>-0.511790</td>\n", | |
" <td>-1.808498</td>\n", | |
" <td>-0.438710</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td> a</td>\n", | |
" <td>-1.468544</td>\n", | |
" <td>-0.047072</td>\n", | |
" <td>-0.450447</td>\n", | |
" <td>-0.979995</td>\n", | |
" <td> NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td> a</td>\n", | |
" <td> 0.461783</td>\n", | |
" <td> 1.509796</td>\n", | |
" <td>-0.323996</td>\n", | |
" <td>-1.680946</td>\n", | |
" <td> NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td> b</td>\n", | |
" <td>-0.287149</td>\n", | |
" <td> 0.596965</td>\n", | |
" <td> 1.252726</td>\n", | |
" <td> 3.096204</td>\n", | |
" <td>-0.525838</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td> a</td>\n", | |
" <td>-0.125060</td>\n", | |
" <td>-1.458435</td>\n", | |
" <td> 1.159053</td>\n", | |
" <td> 1.433513</td>\n", | |
" <td> NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td> NaN</td>\n", | |
" <td> NaN</td>\n", | |
" <td> NaN</td>\n", | |
" <td> NaN</td>\n", | |
" <td> NaN</td>\n", | |
" <td>-1.167854</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>7 rows \u00d7 6 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 29, | |
"text": [ | |
" random guess chance luck fate new\n", | |
"0 b 0.952948 2.062030 1.399049 3.618150 NaN\n", | |
"1 a 1.015215 -0.285176 -0.511790 -1.808498 -0.438710\n", | |
"2 a -1.468544 -0.047072 -0.450447 -0.979995 NaN\n", | |
"3 a 0.461783 1.509796 -0.323996 -1.680946 NaN\n", | |
"4 b -0.287149 0.596965 1.252726 3.096204 -0.525838\n", | |
"5 a -0.125060 -1.458435 1.159053 1.433513 NaN\n", | |
"9 NaN NaN NaN NaN NaN -1.167854\n", | |
"\n", | |
"[7 rows x 6 columns]" | |
] | |
} | |
], | |
"prompt_number": 29 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df[df[\"chance\"] < 0]" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>random</th>\n", | |
" <th>guess</th>\n", | |
" <th>chance</th>\n", | |
" <th>luck</th>\n", | |
" <th>fate</th>\n", | |
" <th>new</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td> a</td>\n", | |
" <td> 1.015215</td>\n", | |
" <td>-0.285176</td>\n", | |
" <td>-0.511790</td>\n", | |
" <td>-1.808498</td>\n", | |
" <td>-0.43871</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td> a</td>\n", | |
" <td>-1.468544</td>\n", | |
" <td>-0.047072</td>\n", | |
" <td>-0.450447</td>\n", | |
" <td>-0.979995</td>\n", | |
" <td> NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td> a</td>\n", | |
" <td>-0.125060</td>\n", | |
" <td>-1.458435</td>\n", | |
" <td> 1.159053</td>\n", | |
" <td> 1.433513</td>\n", | |
" <td> NaN</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>3 rows \u00d7 6 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 30, | |
"text": [ | |
" random guess chance luck fate new\n", | |
"1 a 1.015215 -0.285176 -0.511790 -1.808498 -0.43871\n", | |
"2 a -1.468544 -0.047072 -0.450447 -0.979995 NaN\n", | |
"5 a -0.125060 -1.458435 1.159053 1.433513 NaN\n", | |
"\n", | |
"[3 rows x 6 columns]" | |
] | |
} | |
], | |
"prompt_number": 30 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df[\"guess\"][df[\"chance\"] < 0] = 42" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 31 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>random</th>\n", | |
" <th>guess</th>\n", | |
" <th>chance</th>\n", | |
" <th>luck</th>\n", | |
" <th>fate</th>\n", | |
" <th>new</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td> b</td>\n", | |
" <td> 0.952948</td>\n", | |
" <td> 2.062030</td>\n", | |
" <td> 1.399049</td>\n", | |
" <td> 3.618150</td>\n", | |
" <td> NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td> a</td>\n", | |
" <td> 42.000000</td>\n", | |
" <td>-0.285176</td>\n", | |
" <td>-0.511790</td>\n", | |
" <td>-1.808498</td>\n", | |
" <td>-0.438710</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td> a</td>\n", | |
" <td> 42.000000</td>\n", | |
" <td>-0.047072</td>\n", | |
" <td>-0.450447</td>\n", | |
" <td>-0.979995</td>\n", | |
" <td> NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td> a</td>\n", | |
" <td> 0.461783</td>\n", | |
" <td> 1.509796</td>\n", | |
" <td>-0.323996</td>\n", | |
" <td>-1.680946</td>\n", | |
" <td> NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td> b</td>\n", | |
" <td> -0.287149</td>\n", | |
" <td> 0.596965</td>\n", | |
" <td> 1.252726</td>\n", | |
" <td> 3.096204</td>\n", | |
" <td>-0.525838</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td> a</td>\n", | |
" <td> 42.000000</td>\n", | |
" <td>-1.458435</td>\n", | |
" <td> 1.159053</td>\n", | |
" <td> 1.433513</td>\n", | |
" <td> NaN</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td> NaN</td>\n", | |
" <td> NaN</td>\n", | |
" <td> NaN</td>\n", | |
" <td> NaN</td>\n", | |
" <td> NaN</td>\n", | |
" <td>-1.167854</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>7 rows \u00d7 6 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 32, | |
"text": [ | |
" random guess chance luck fate new\n", | |
"0 b 0.952948 2.062030 1.399049 3.618150 NaN\n", | |
"1 a 42.000000 -0.285176 -0.511790 -1.808498 -0.438710\n", | |
"2 a 42.000000 -0.047072 -0.450447 -0.979995 NaN\n", | |
"3 a 0.461783 1.509796 -0.323996 -1.680946 NaN\n", | |
"4 b -0.287149 0.596965 1.252726 3.096204 -0.525838\n", | |
"5 a 42.000000 -1.458435 1.159053 1.433513 NaN\n", | |
"9 NaN NaN NaN NaN NaN -1.167854\n", | |
"\n", | |
"[7 rows x 6 columns]" | |
] | |
} | |
], | |
"prompt_number": 32 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df.fillna(df.mean())" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>random</th>\n", | |
" <th>guess</th>\n", | |
" <th>chance</th>\n", | |
" <th>luck</th>\n", | |
" <th>fate</th>\n", | |
" <th>new</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td> b</td>\n", | |
" <td> 0.952948</td>\n", | |
" <td> 2.062030</td>\n", | |
" <td> 1.399049</td>\n", | |
" <td> 3.618150</td>\n", | |
" <td>-0.710800</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td> a</td>\n", | |
" <td> 42.000000</td>\n", | |
" <td>-0.285176</td>\n", | |
" <td>-0.511790</td>\n", | |
" <td>-1.808498</td>\n", | |
" <td>-0.438710</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td> a</td>\n", | |
" <td> 42.000000</td>\n", | |
" <td>-0.047072</td>\n", | |
" <td>-0.450447</td>\n", | |
" <td>-0.979995</td>\n", | |
" <td>-0.710800</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td> a</td>\n", | |
" <td> 0.461783</td>\n", | |
" <td> 1.509796</td>\n", | |
" <td>-0.323996</td>\n", | |
" <td>-1.680946</td>\n", | |
" <td>-0.710800</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td> b</td>\n", | |
" <td> -0.287149</td>\n", | |
" <td> 0.596965</td>\n", | |
" <td> 1.252726</td>\n", | |
" <td> 3.096204</td>\n", | |
" <td>-0.525838</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td> a</td>\n", | |
" <td> 42.000000</td>\n", | |
" <td>-1.458435</td>\n", | |
" <td> 1.159053</td>\n", | |
" <td> 1.433513</td>\n", | |
" <td>-0.710800</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td> NaN</td>\n", | |
" <td> 21.187930</td>\n", | |
" <td> 0.396351</td>\n", | |
" <td> 0.420766</td>\n", | |
" <td> 0.613072</td>\n", | |
" <td>-1.167854</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>7 rows \u00d7 6 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 33, | |
"text": [ | |
" random guess chance luck fate new\n", | |
"0 b 0.952948 2.062030 1.399049 3.618150 -0.710800\n", | |
"1 a 42.000000 -0.285176 -0.511790 -1.808498 -0.438710\n", | |
"2 a 42.000000 -0.047072 -0.450447 -0.979995 -0.710800\n", | |
"3 a 0.461783 1.509796 -0.323996 -1.680946 -0.710800\n", | |
"4 b -0.287149 0.596965 1.252726 3.096204 -0.525838\n", | |
"5 a 42.000000 -1.458435 1.159053 1.433513 -0.710800\n", | |
"9 NaN 21.187930 0.396351 0.420766 0.613072 -1.167854\n", | |
"\n", | |
"[7 rows x 6 columns]" | |
] | |
} | |
], | |
"prompt_number": 33 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"for k, v in df.iterrows():\n", | |
" print v['chance']" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"2.06203037589\n", | |
"-0.28517591262\n", | |
"-0.0470721934499\n", | |
"1.50979564075\n", | |
"0.596964926409\n", | |
"-1.45843483675\n", | |
"nan\n" | |
] | |
} | |
], | |
"prompt_number": 34 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df.values" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 35, | |
"text": [ | |
"array([['b', 0.9529479586539865, 2.0620303758857146, 1.3990485405047015,\n", | |
" 3.6181499063932105, nan],\n", | |
" ['a', 42.0, -0.2851759126198742, -0.5117904093719614,\n", | |
" -1.8084975116604265, -0.43870955141495604],\n", | |
" ['a', 42.0, -0.04707219344987416, -0.4504467615033189,\n", | |
" -0.9799947119018371, nan],\n", | |
" ['a', 0.4617828830291011, 1.509795640747286, -0.3239958303468689,\n", | |
" -1.6809457423888072, nan],\n", | |
" ['b', -0.2871493218525701, 0.5969649264089326, 1.2527257785800212,\n", | |
" 3.096204399794183, -0.5258376995671956],\n", | |
" ['a', 42.0, -1.4584348367523565, 1.1590534187255992,\n", | |
" 1.4335128551444463, nan],\n", | |
" [nan, nan, nan, nan, nan, -1.1678536659869951]], dtype=object)" | |
] | |
} | |
], | |
"prompt_number": 35 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Data summary\n", | |
"\n", | |
"* head, describe, hist, value_counts" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df[\"random\"].value_counts()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 36, | |
"text": [ | |
"a 4\n", | |
"b 2\n", | |
"dtype: int64" | |
] | |
} | |
], | |
"prompt_number": 36 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df[\"guess\"].describe()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 37, | |
"text": [ | |
"count 6.000000\n", | |
"mean 21.187930\n", | |
"std 22.801901\n", | |
"min -0.287149\n", | |
"25% 0.584574\n", | |
"50% 21.476474\n", | |
"75% 42.000000\n", | |
"max 42.000000\n", | |
"Name: guess, dtype: float64" | |
] | |
} | |
], | |
"prompt_number": 37 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df[\"guess\"].hist()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 38, | |
"text": [ | |
"<matplotlib.axes.AxesSubplot at 0x10c70ca10>" | |
] | |
}, | |
{ | |
"metadata": {}, | |
"output_type": "display_data", | |
"png": "iVBORw0KGgoAAAANSUhEUgAAAXUAAAEACAYAAABMEua6AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGqpJREFUeJzt3H9M1Pf9B/DnMcgMaKQm83T8KJtQBcW7s2Yki50f56gr\nnczF/mGTOVH/YNrOuD9N1jRLFO32h9O4NHZJta6J+p3ZUrIgqcl4d1OHZvZMu52ZUMQeKKTOsY62\nVse9v38UTin0jtP3u+8ffT4SQt/cx+P5svL0eHFnSEopQUREXsgzHYCIiNRhqRMReYSlTkTkEZY6\nEZFHWOpERB5hqRMReSRjqd+6dQt1dXWIRqOoqanBjh07Jr1u27ZtqKqqQiQSQTwe1xKUiIiyy890\n47Rp09DR0YHCwkL873//w7Jly3D69GksW7YsfU1bWxu6u7vR1dWFc+fOYcuWLejs7NQenIiIJsq6\nfiksLAQA3L59GyMjI5g1a9a421tbW7FhwwYAQF1dHYaGhjA4OKghKhERZZO11FOpFKLRKMLhMFas\nWIGamppxt/f396OsrCx9Li0tRV9fn/qkRESUVdZSz8vLw8WLF9HX14c///nPEEJMuObT/9JAKBRS\nFpCIiKYu4079XjNnzsSTTz6Jv/3tbwiCIP3xkpISJJPJ9Lmvrw8lJSUTfn1JSQmuXbv2YGmJiL5g\n5s2bh+7u7ilfn/GR+o0bNzA0NAQA+Oijj3Dq1CnEYrFx1zQ2NuLIkSMAgM7OThQXFyMcDk+4r2vX\nrkFK6dXbrl27kJe3A4AE8Pzoe3VvRUVluHr1qvE5n3/+eeMZON8XY75P5Pq1ksvXHozPmOvbO++8\nk6mmJ8j4SP369evYsGEDUqkUUqkU1q9fj5UrV+LgwYMAgObmZjQ0NKCtrQ2VlZUoKirCoUOHcgrg\nj17TAbTp7e01HUErzue6XtMBrJKx1Gtra/Hmm29O+Hhzc/O484EDB9SmIiKi+8JXlCrTZDqANk1N\nTaYjaMX5XNdkOoBVWOrKBKYDaHPvD8Z9xPlcF5gOYBWWujLCdABtJnsaq084n+uE6QBWYakTEXmE\npa5MYDqANr5/+875XBeYDmAVljoRkUdY6soI0wG08X0ny/lcJ0wHsApLnYjIIyx1ZQLTAbTxfSfL\n+VwXmA5gFZY6EZFHWOrKCNMBtPF9J8v5XCdMB7AKS52IyCMsdWUC0wG08X0ny/lcF5gOYBWWOhGR\nR1jqygjTAbTxfSfL+VwnTAewCkudiMgjLHVlAtMBtPF9J8v5XBeYDmAVljoRkUdY6soI0wG08X0n\ny/lcJ0wHsApLnYjIIyx1ZQLTAbTxfSfL+VwXmA5gFZY6EZFHWOrKCNMBtPF9J8v5XCdMB7AKS52I\nyCMsdWUC0wG08X0ny/lcF5gOYBWWOhGRR1jqygjTAbTxfSfL+VwnTAewCkudiMgjGUs9mUxixYoV\nWLhwIRYtWoT9+/dPuEYIgZkzZyIWiyEWi2Hnzp3awtotMB1AG993spzPdYHpAFbJz3RjQUEB9u7d\ni2g0iuHhYTz66KOor69HdXX1uOuWL1+O1tZWrUGJiCi7jI/U58yZg2g0CgCYPn06qqurce3atQnX\nSSn1pHOKMB1AG993spzPdcJ0AKtMeafe29uLeDyOurq6cR8PhUI4e/YsIpEIGhoakEgklIckIqKp\nybh+GTM8PIynnnoK+/btw/Tp08fdtmTJEiSTSRQWFuLkyZNYs2YNLl++rCWs3QLTAbTxfSfL+VwX\nmA5glaylfufOHaxduxY//OEPsWbNmgm3z5gxI/3fTzzxBLZu3YqbN29i1qxZE65tampCRUUFAKC4\nuBjRaDT9B27sW0SXzj09PQBmj04nRt8Hys4jI7cwxoZ5eeZZ9/musXOg+Ayt+VWchRA4fPgwAKT7\nMicyg1QqJdevXy+3b9/+mdcMDAzIVColpZTy3Llz8uGHH570uiyfykm7du2SeXk7JCAl0DH6Xt1b\nUVGZvHr1qukxZUdHh+kIWnE+ewC4j6+VXL723OuhXDNnfKR+5swZvPrqq1i8eDFisRgAoKWlBe++\n+y4AoLm5GSdOnMCLL76I/Px8FBYW4tixY7n/zUJEREqERv8m0P+JQiHvniXT0tKC554bRirVouX+\ni4rKkUicRnl5uZb7J7JNKBQCoLMn3OuhXLuTryglIvIIS10ZYTqANhN/iOUXzuc6YTqAVVjqREQe\nYakrE5gOoM3Y0658xflcF5gOYBWWOhGRR1jqygjTAbTxfSfL+VwnTAewCkudiMgjLHVlAtMBtPF9\nJ8v5XBeYDmAVljoRkUdY6soI0wG08X0ny/lcJ0wHsApLnYjIIyx1ZQLTAbTxfSfL+VwXmA5gFZY6\nEZFHWOrKCNMBtPF9J8v5XCdMB7AKS52IyCMsdWUC0wG08X0ny/lcF5gOYBWWOhGRR1jqygjTAbTx\nfSfL+VwnTAewCkudiMgjLHVlAtMBtPF9J8v5XBeYDmAVljoRkUdY6soI0wG08X0ny/lcJ0wHsApL\nnYjIIyx1ZQLTAbTxfSfL+VwXmA5gFZY6EZFHWOrKCNMBtPF9J8v5XCdMB7AKS52IyCMsdWUC0wG0\n8X0ny/lcF5gOYBWWOhGRRzKWejKZxIoVK7Bw4UIsWrQI+/fvn/S6bdu2oaqqCpFIBPF4XEtQ+wnT\nAbTxfSfL+VwnTAewSn6mGwsKCrB3715Eo1EMDw/j0UcfRX19Paqrq9PXtLW1obu7G11dXTh37hy2\nbNmCzs5O7cGJiGiijI/U58yZg2g0CgCYPn06qqurce3atXHXtLa2YsOGDQCAuro6DA0NYXBwUFNc\nmwWmA2jj+06W87kuMB3AKlPeqff29iIej6Ourm7cx/v7+1FWVpY+l5aWoq+vT11CIiKasozrlzHD\nw8N46qmnsG/fPkyfPn3C7VLKcedQKDTp/TQ1NaGiogIAUFxcjGg0mn4UMbb3c+nc09MDYPbodL8C\nEMXdRw1i9P39n0dGbmGMyXnv3cna9PvP+fyb766xczCF872/Ntv10Jpf1f+vw4cPA0C6L3Mis7h9\n+7Z8/PHH5d69eye9vbm5WR49ejR9nj9/vhwYGJhw3RQ+lXN27dol8/J2SEBKoGP0vbq3oqIyefXq\nVdNjyo6ODtMRtOJ89gBwH18ruXztuddDuWbOuH6RUmLz5s2oqanB9u3bJ72msbERR44cAQB0dnai\nuLgY4XA4979dnBeYDqDN2KMJX3E+1wWmA1gl4/rlzJkzePXVV7F48WLEYjEAQEtLC959910AQHNz\nMxoaGtDW1obKykoUFRXh0KFD+lMTEdGkMpb6smXLkEqlst7JgQMHlAVyl4CvjxiEEF4/2uN8rhPw\n9WvvfvAVpUREHmGpKxOYDqCN34/yOJ/7AtMBrMJSJyLyCEtdGWE6gDYTn0PsF87nOmE6gFVY6kRE\nHmGpKxOYDqCN7ztZzue6wHQAq7DUiYg8wlJXRpgOoI3vO1nO5zphOoBVWOpERB5hqSsTmA6gje87\nWc7nusB0AKuw1ImIPMJSV0aYDqCN7ztZzuc6YTqAVVjqREQeYakrE5gOoI3vO1nO57rAdACrsNSJ\niDzCUldGmA6gje87Wc7nOmE6gFVY6kREHmGpKxOYDqCN7ztZzue6wHQAq7DUiYg8wlJXRpgOoI3v\nO1nO5zphOoBVWOpERB5hqSsTmA6gje87Wc7nusB0AKuw1ImIPMJSV0aYDqCN7ztZzuc6YTqAVVjq\nREQeYakrE5gOoI3vO1nO57rAdACrsNSJiDzCUldGmA6gje87Wc7nOmE6gFWylvqmTZsQDodRW1s7\n6e1CCMycOROxWAyxWAw7d+5UHpKIiKYmP9sFGzduxE9+8hP86Ec/+sxrli9fjtbWVqXB3BOYDqCN\n7ztZzue6wHQAq2R9pP7YY4/hoYceyniNlFJZICIiun8PvFMPhUI4e/YsIpEIGhoakEgkVORykDAd\nQBvfd7Kcz3XCdACrZF2/ZLNkyRIkk0kUFhbi5MmTWLNmDS5fvjzptU1NTaioqAAAFBcXIxqNpr81\nHPuD59K5p6cHwOzR6S6Ovg9G34sHPo+M3MIYG+blmWfd57vGzoHiM7TmV3EWQuDw4cMAkO7LXITk\nFHYnvb29WL16Nd5+++2sd/i1r30NFy5cwKxZs8Z/olDIuzVNS0sLnntuGKlUi5b7LyoqRyJxGuXl\n5Vrun8g2oVAIgM6ecK+Hcu3OB16/DA4Opj/h+fPnIaWcUOhERPT5yFrqTz/9NL75zW/in//8J8rK\nyvDyyy/j4MGDOHjwIADgxIkTqK2tRTQaxfbt23Hs2DHtoe0kTAfQZuK3xn7hfK4TpgNYJetO/ejR\noxlvf+aZZ/DMM88oC0RERPePryhVJjAdQJuxH+b4ivO5LjAdwCosdSIij7DUlRGmA2jj+06W87lO\nmA5gFZY6EZFHWOrKBKYDaOP7TpbzuS4wHcAqLHUiIo+w1JURpgNo4/tOlvO5TpgOYBWWOhGRR1jq\nygSmA2jj+06W87kuMB3AKix1IiKPsNSVEaYDaOP7TpbzuU6YDmAVljoRkUdY6soEpgNo4/tOlvO5\nLjAdwCosdSIij7DUlRGmA2jj+06W87lOmA5gFZY6EZFHWOrKBKYDaOP7TpbzuS4wHcAqLHUiIo+w\n1JURpgNo4/tOlvO5TpgOYBWWOhGRR1jqygSmA2jj+06W87kuMB3AKix1IiKPsNSVEaYDaOP7Tpbz\nuU6YDmAVljoRkUdY6soEpgNo4/tOlvO5LjAdwCosdSIij7DUlRGmA2jj+06W87lOmA5gFZY6EZFH\nspb6pk2bEA6HUVtb+5nXbNu2DVVVVYhEIojH40oDuiMwHUAb33eynM91gekAVsla6hs3bkR7e/tn\n3t7W1obu7m50dXXhpZdewpYtW5QGJCKiqcta6o899hgeeuihz7y9tbUVGzZsAADU1dVhaGgIg4OD\n6hI6Q5gOoI3vO1nO5zphOoBVHnin3t/fj7KysvS5tLQUfX19D3q3RER0H/JV3ImUctw5FApNel1T\nUxMqKioAAMXFxYhGo+l939ijCZfOPT09AGbfM6HA3f2eGH1//+eRkVt379ngvEEQWPH7zfn8n++u\nsXMwhXOQw/XQml/FWQiBw4cPA0C6L3MRkp9u5En09vZi9erVePvttyfc9uMf/xhBEGDdunUAgAUL\nFuCNN95AOBwe/4lCoQnl77qWlhY899wwUqkWLfdfVFSOROI0ysvLtdw/kW0+eUCosyfc66Fcu/OB\n1y+NjY04cuQIAKCzsxPFxcUTCv2LQZgOoM3ER1F+4XyuE6YDWCXr+uXpp5/GG2+8gRs3bqCsrAw/\n//nPcefOHQBAc3MzGhoa0NbWhsrKShQVFeHQoUPaQxMR0eSmtH5R8om4fskZ1y/0RcP1y0Sf+/qF\niIjswVJXRpgOoI3vO1nO5zphOoBVWOpERB5hqSsTmA6gzdhzaX3F+VwXmA5gFZY6EZFHWOrKCNMB\ntPF9J8v5XCdMB7AKS52IyCMsdWUC0wG08X0ny/lcF5gOYBWWOhGRR1jqygjTAbTxfSfL+VwnTAew\nCkudiMgjLHVlAtMBtPF9J8v5XBeYDmAVljoRkUdY6soI0wG08X0ny/lcJ0wHsApLnYjIIyx1ZQLT\nAbTxfSfL+VwXmA5gFZY6EZFHWOrKCNMBtPF9J8v5XCdMB7AKS52IyCMsdWUC0wG08X0ny/lcF5gO\nYBWWOhGRR1jqygjTAbTxfSfL+VwnTAewCkudiMgjLHVlAtMBtPF9J8v5XBeYDmAVljoRkUdY6soI\n0wG08X0ny/lcJ0wHsApLnYjIIyx1ZQLTAbTxfSfL+VwXmA5glayl3t7ejgULFqCqqgovvPDChNuF\nEJg5cyZisRhisRh27typJSgREWWXsdRHRkbw7LPPor29HYlEAkePHsWlS5cmXLd8+XLE43HE43H8\n7Gc/0xbWbsJ0AG1838lyPtcJ0wGskrHUz58/j8rKSlRUVKCgoADr1q3Da6+9NuE6KaW2gERENHUZ\nS72/vx9lZWXpc2lpKfr7+8ddEwqFcPbsWUQiETQ0NCCRSOhJar3AdABtfN/Jcj7XBaYDWCU/042h\nUCjrHSxZsgTJZBKFhYU4efIk1qxZg8uXLysLSEREU5ex1EtKSpBMJtPnZDKJ0tLScdfMmDEj/d9P\nPPEEtm7dips3b2LWrFkT7q+pqQkVFRUAgOLiYkSj0fSjiLG9n0vnnp4eALNHp/sVgCjuPmoQo+/v\n/zwycgtjTM57707Wpt9/zufffHeNnYMpnO/9tdmuh9b8qv5/HT58GADSfZkTmcGdO3fk17/+dXnl\nyhX58ccfy0gkIhOJxLhrBgYGZCqVklJKee7cOfnwww9Pel9ZPpWTdu3aJfPydkhASqBj9L26t6Ki\nMnn16lXTY8qOjg7TEbTifPYAcB9fK7l87bnXQ7lmzvhIPT8/HwcOHMCqVaswMjKCzZs3o7q6GgcP\nHgQANDc348SJE3jxxReRn5+PwsJCHDt2LPe/WbwQmA6gzdijCV9xPtcFpgNYJTT6N4H+TxQKefcs\nmZaWFjz33DBSqRYt919UVI5E4jTKy8u13D+RbT75OZ7OnnCvh3LtTr6iVBlhOoA2E/edfuF8rhOm\nA1iFpU5E5BGWujKB6QDa+L6T5XyuC0wHsApLnYjIIyx1ZYTpANr4vpPlfK4TpgNYhaVOROQRlroy\ngekA2vi+k+V8rgtMB7AKS52IyCMsdWWE6QDa+L6T5XyuE6YDWIWlTkTkEZa6MoHpANr4vpPlfK4L\nTAewCkudiMgjLHVlhOkA2vi+k+V8rhOmA1iFpU5E5BGWujKB6QDa+L6T5XyuC0wHsApLnYjIIyx1\nZYTpANr4vpPlfK4TpgNYhaVOROQRlroygekA2vi+k+V8rgtMB7AKS52IyCMsdWWE6QDa+L6T5Xyu\nE6YDWIWlTkTkEZa6MoHpANr4vpPlfK4LTAewCkudiMgjLHVlhOkA2vi+k+V8rhOmA1iFpU5E5BGW\nujKB6QDa+L6T5XyuC0wHsApLnYjII1lLvb29HQsWLEBVVRVeeOGFSa/Ztm0bqqqqEIlEEI/HlYd0\ngzAdQBvfd7Kcz3XCdACrZCz1kZERPPvss2hvb0cikcDRo0dx6dKlcde0tbWhu7sbXV1deOmll7Bl\nyxatge110XQAbS5e9Hc2gPO5z/f5cpOx1M+fP4/KykpUVFSgoKAA69atw2uvvTbumtbWVmzYsAEA\nUFdXh6GhIQwODupLbK0h0wG0GRrydzaA87nP9/lyk7HU+/v7UVZWlj6Xlpaiv78/6zV9fX2KYxIR\n0VTkZ7oxFApN6U6klPf161yXl5eHgoL/w5e//DY+/DCOwsILSu//ww/fQ16e+Z9l9/b2mo6gFedz\nXa/pAFbJWOolJSVIJpPpczKZRGlpacZr+vr6UFJSMuG+5s2b523Zf/zxOwCA99/vz3Jl7u79Lsik\nV155xXQErTifTe6nJ6Y+n2s9NG/evJyuz1jqS5cuRVdXF3p7e/HVr34Vx48fx9GjR8dd09jYiAMH\nDmDdunXo7OxEcXExwuHwhPvq7u7OKRgREeUuY6nn5+fjwIEDWLVqFUZGRrB582ZUV1fj4MGDAIDm\n5mY0NDSgra0NlZWVKCoqwqFDhz6X4ERENFFIfnohTkREztL6U7jf/e53WLhwIb70pS/hzTffHHfb\n7t27UVVVhQULFuD111/XGUOrqbw4yyWbNm1COBxGbW1t+mM3b95EfX09HnnkETz++ONOP0UumUxi\nxYoVWLhwIRYtWoT9+/cD8GPGW7duoa6uDtFoFDU1NdixYwcAP2a718jICGKxGFavXg3Ar/kqKiqw\nePFixGIxfOMb3wCQ+3xaS722thZ/+MMf8K1vfWvcxxOJBI4fP45EIoH29nZs3boVqVRKZxQtpvLi\nLNds3LgR7e3t4z62Z88e1NfX4/Lly1i5ciX27NljKN2DKygowN69e/GPf/wDnZ2d+PWvf41Lly55\nMeO0adPQ0dGBixcv4q233kJHRwdOnz7txWz32rdvH2pqatI/8PRpvlAoBCEE4vE4zp8/D+A+5pOf\ngyAI5IULF9LnlpYWuWfPnvR51apV8q9//evnEUWps2fPylWrVqXPu3fvlrt37zaYSI0rV67IRYsW\npc/z58+XAwMDUkopr1+/LufPn28qmnLf//735alTp7yb8YMPPpBLly6Vf//7372aLZlMypUrV8o/\n/elP8nvf+56U0q8/nxUVFfLGjRvjPpbrfEaeBH3t2rVxT42c7EVNLpjKi7N8MDg4mH5GUzgc9uYV\nw729vYjH46irq/NmxlQqhWg0inA4nF4z+TIbAPz0pz/FL3/5y3Gv3/BpvlAohO985ztYunQpfvOb\n3wDIfb6Mz36Zivr6egwMDEz4eEtLS3rnNRWuPXcUcDPzgwqFQl7MPTw8jLVr12Lfvn2YMWPGuNtc\nnjEvLw8XL17Ef/7zH6xatQodHR3jbnd5tj/+8Y+YPXs2YrHYZ/4jZS7PBwBnzpzB3Llz8d5776G+\nvh4LFiwYd/tU5nvgUj916lTOv2aqL1iy3VRenOWDcDiMgYEBzJkzB9evX8fs2bNNR3ogd+7cwdq1\na7F+/XqsWbMGgH8zzpw5E08++SQuXLjgzWxnz55Fa2sr2tracOvWLbz//vtYv369N/MBwNy5cwEA\nX/nKV/CDH/wA58+fz3m+z239Iu955mRjYyOOHTuG27dv48qVK+jq6kr/pNcl97446/bt2zh+/Dga\nGxtNx1KusbEx/YrEV155JV2ELpJSYvPmzaipqcH27dvTH/dhxhs3bqSfGfHRRx/h1KlTiMViXswG\nfPLdfzKZxJUrV3Ds2DF8+9vfxm9/+1tv5vvwww/x3//+FwDwwQcf4PXXX0dtbW3u82na90sppfz9\n738vS0tL5bRp02Q4HJbf/e5307ft2rVLzps3T86fP1+2t7frjKFVW1ubfOSRR+S8efNkS0uL6TgP\nbN26dXLu3LmyoKBAlpaWypdffln+61//kitXrpRVVVWyvr5e/vvf/zYd87795S9/kaFQSEYiERmN\nRmU0GpUnT570Ysa33npLxmIxGYlEZG1trfzFL34hpZRezPZpQgi5evVqKaU/8/X09MhIJCIjkYhc\nuHBhuk9ynY8vPiIi8oj5fwKQiIiUYakTEXmEpU5E5BGWOhGRR1jqREQeYakTEXmEpU5E5BGWOhGR\nR/4ftWgwQ400YEMAAAAASUVORK5CYII=\n", | |
"text": [ | |
"<matplotlib.figure.Figure at 0x10c6ff110>" | |
] | |
} | |
], | |
"prompt_number": 38 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Reading and writing\n", | |
"\n", | |
"`csv`, `excel`, `hdf`, `sql`, `json`, `html`, `stata`, `clipboard`, `pickle`,\n", | |
"and experimental (as of Pandas 0.13.1): `msgpack`, `gbq`.\n", | |
"\n", | |
"And of course typical Python objects, like `list`, `dict` or `numpy.array`.\n", | |
"\n", | |
"http://pandas.pydata.org/pandas-docs/stable/io.html" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df = pd.read_csv(\"gimn_egz_2002do2013.csv\",\n", | |
" encoding='utf8')" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 39 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"If you need it, download from:\n", | |
"https://github.com/stared/szkolomat_dane/ (direct link is [here](https://raw.githubusercontent.com/stared/szkolomat_dane/master/sp_spr_2002do2013.csv))." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"len(df)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 40, | |
"text": [ | |
"181463" | |
] | |
} | |
], | |
"prompt_number": 40 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df.head()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>id_szkoly</th>\n", | |
" <th>egz_norm_sr</th>\n", | |
" <th>egz_norm_std</th>\n", | |
" <th>probka</th>\n", | |
" <th>rok</th>\n", | |
" <th>czesc</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td> 13997</td>\n", | |
" <td> 87.235615</td>\n", | |
" <td> 10.182524</td>\n", | |
" <td> 13</td>\n", | |
" <td> 2002</td>\n", | |
" <td> gh</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td> 13999</td>\n", | |
" <td> 91.391710</td>\n", | |
" <td> NaN</td>\n", | |
" <td> 1</td>\n", | |
" <td> 2002</td>\n", | |
" <td> gh</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td> 14002</td>\n", | |
" <td> 101.964605</td>\n", | |
" <td> 13.317307</td>\n", | |
" <td> 189</td>\n", | |
" <td> 2002</td>\n", | |
" <td> gh</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td> 14004</td>\n", | |
" <td> 100.250970</td>\n", | |
" <td> 14.395403</td>\n", | |
" <td> 70</td>\n", | |
" <td> 2002</td>\n", | |
" <td> gh</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td> 14005</td>\n", | |
" <td> 95.465265</td>\n", | |
" <td> 13.085892</td>\n", | |
" <td> 62</td>\n", | |
" <td> 2002</td>\n", | |
" <td> gh</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>5 rows \u00d7 6 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 41, | |
"text": [ | |
" id_szkoly egz_norm_sr egz_norm_std probka rok czesc\n", | |
"0 13997 87.235615 10.182524 13 2002 gh\n", | |
"1 13999 91.391710 NaN 1 2002 gh\n", | |
"2 14002 101.964605 13.317307 189 2002 gh\n", | |
"3 14004 100.250970 14.395403 70 2002 gh\n", | |
"4 14005 95.465265 13.085892 62 2002 gh\n", | |
"\n", | |
"[5 rows x 6 columns]" | |
] | |
} | |
], | |
"prompt_number": 41 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df = df.rename(columns={'id_szkoly': 'school_id',\n", | |
" 'egz_norm_sr': 'mean_score',\n", | |
" 'egz_norm_std': 'std_dev',\n", | |
" 'probka': 'no_of_students',\n", | |
" 'rok': 'year',\n", | |
" 'czesc': 'exam_part'})" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 42 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df.head()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>school_id</th>\n", | |
" <th>mean_score</th>\n", | |
" <th>std_dev</th>\n", | |
" <th>no_of_students</th>\n", | |
" <th>year</th>\n", | |
" <th>exam_part</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td> 13997</td>\n", | |
" <td> 87.235615</td>\n", | |
" <td> 10.182524</td>\n", | |
" <td> 13</td>\n", | |
" <td> 2002</td>\n", | |
" <td> gh</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td> 13999</td>\n", | |
" <td> 91.391710</td>\n", | |
" <td> NaN</td>\n", | |
" <td> 1</td>\n", | |
" <td> 2002</td>\n", | |
" <td> gh</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td> 14002</td>\n", | |
" <td> 101.964605</td>\n", | |
" <td> 13.317307</td>\n", | |
" <td> 189</td>\n", | |
" <td> 2002</td>\n", | |
" <td> gh</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td> 14004</td>\n", | |
" <td> 100.250970</td>\n", | |
" <td> 14.395403</td>\n", | |
" <td> 70</td>\n", | |
" <td> 2002</td>\n", | |
" <td> gh</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td> 14005</td>\n", | |
" <td> 95.465265</td>\n", | |
" <td> 13.085892</td>\n", | |
" <td> 62</td>\n", | |
" <td> 2002</td>\n", | |
" <td> gh</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>5 rows \u00d7 6 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 43, | |
"text": [ | |
" school_id mean_score std_dev no_of_students year exam_part\n", | |
"0 13997 87.235615 10.182524 13 2002 gh\n", | |
"1 13999 91.391710 NaN 1 2002 gh\n", | |
"2 14002 101.964605 13.317307 189 2002 gh\n", | |
"3 14004 100.250970 14.395403 70 2002 gh\n", | |
"4 14005 95.465265 13.085892 62 2002 gh\n", | |
"\n", | |
"[5 rows x 6 columns]" | |
] | |
} | |
], | |
"prompt_number": 43 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df['year'].value_counts()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 44, | |
"text": [ | |
"2013 27198\n", | |
"2012 27127\n", | |
"2011 13460\n", | |
"2010 13311\n", | |
"2009 13206\n", | |
"2008 13110\n", | |
"2007 13008\n", | |
"2006 12875\n", | |
"2005 12629\n", | |
"2004 12455\n", | |
"2003 11720\n", | |
"2002 11364\n", | |
"dtype: int64" | |
] | |
} | |
], | |
"prompt_number": 44 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df['exam_part'].value_counts()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 45, | |
"text": [ | |
"gh 63600\n", | |
"gm 63538\n", | |
"gh_h 13582\n", | |
"gh_p 13582\n", | |
"gm_m 13581\n", | |
"gm_p 13580\n", | |
"dtype: int64" | |
] | |
} | |
], | |
"prompt_number": 45 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df[df['year'] == 2013].describe()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>school_id</th>\n", | |
" <th>mean_score</th>\n", | |
" <th>std_dev</th>\n", | |
" <th>no_of_students</th>\n", | |
" <th>year</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>count</th>\n", | |
" <td> 27198.000000</td>\n", | |
" <td> 27198.000000</td>\n", | |
" <td> 27125.000000</td>\n", | |
" <td> 27198.000000</td>\n", | |
" <td> 27198</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>mean</th>\n", | |
" <td> 18876.419590</td>\n", | |
" <td> 99.090552</td>\n", | |
" <td> 13.070827</td>\n", | |
" <td> 55.841385</td>\n", | |
" <td> 2013</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>std</th>\n", | |
" <td> 7614.400545</td>\n", | |
" <td> 7.748602</td>\n", | |
" <td> 2.278386</td>\n", | |
" <td> 42.845946</td>\n", | |
" <td> 0</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>min</th>\n", | |
" <td> 14034.000000</td>\n", | |
" <td> 44.116249</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 1.000000</td>\n", | |
" <td> 2013</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>25%</th>\n", | |
" <td> 15985.250000</td>\n", | |
" <td> 95.542765</td>\n", | |
" <td> 11.788384</td>\n", | |
" <td> 24.000000</td>\n", | |
" <td> 2013</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>50%</th>\n", | |
" <td> 17903.000000</td>\n", | |
" <td> 99.100521</td>\n", | |
" <td> 13.214976</td>\n", | |
" <td> 44.000000</td>\n", | |
" <td> 2013</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>75%</th>\n", | |
" <td> 19822.750000</td>\n", | |
" <td> 102.816036</td>\n", | |
" <td> 14.487819</td>\n", | |
" <td> 74.000000</td>\n", | |
" <td> 2013</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>max</th>\n", | |
" <td> 76022.000000</td>\n", | |
" <td> 136.636502</td>\n", | |
" <td> 34.353405</td>\n", | |
" <td> 324.000000</td>\n", | |
" <td> 2013</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>8 rows \u00d7 5 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 46, | |
"text": [ | |
" school_id mean_score std_dev no_of_students year\n", | |
"count 27198.000000 27198.000000 27125.000000 27198.000000 27198\n", | |
"mean 18876.419590 99.090552 13.070827 55.841385 2013\n", | |
"std 7614.400545 7.748602 2.278386 42.845946 0\n", | |
"min 14034.000000 44.116249 0.000000 1.000000 2013\n", | |
"25% 15985.250000 95.542765 11.788384 24.000000 2013\n", | |
"50% 17903.000000 99.100521 13.214976 44.000000 2013\n", | |
"75% 19822.750000 102.816036 14.487819 74.000000 2013\n", | |
"max 76022.000000 136.636502 34.353405 324.000000 2013\n", | |
"\n", | |
"[8 rows x 5 columns]" | |
] | |
} | |
], | |
"prompt_number": 46 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df[df['year'] == 2013].mean_score.apply(type).value_counts()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 47, | |
"text": [ | |
"<type 'numpy.float64'> 27198\n", | |
"dtype: int64" | |
] | |
} | |
], | |
"prompt_number": 47 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"best_of_2013 = df[(df['year'] == 2013) & (df['exam_part'] == 'gm_p')]\n", | |
"best_of_2013 = best_of_2013[best_of_2013['mean_score'] > 120.]" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 48 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"best_of_2013.head().values" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 49, | |
"text": [ | |
"array([[14323, 120.39388635816549, 9.856241445653913, 59, 2013, u'gm_p'],\n", | |
" [14459, 120.13272510022158, 12.604354608903089, 34, 2013, u'gm_p'],\n", | |
" [14479, 125.75772057006449, 9.302966477302826, 96, 2013, u'gm_p'],\n", | |
" [14489, 120.9617441796834, 12.391730327041405, 7, 2013, u'gm_p'],\n", | |
" [14490, 120.40927031104006, 10.972039172309728, 37, 2013, u'gm_p']], dtype=object)" | |
] | |
} | |
], | |
"prompt_number": 49 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"best_of_2013.to_csv(\"best_of_2013.csv\")" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 50 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df.groupby('school_id').mean()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>mean_score</th>\n", | |
" <th>std_dev</th>\n", | |
" <th>no_of_students</th>\n", | |
" <th>year</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>school_id</th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>13997</th>\n", | |
" <td> 87.168595</td>\n", | |
" <td> 9.749717</td>\n", | |
" <td> 16.000000</td>\n", | |
" <td> 2005.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>13999</th>\n", | |
" <td> 92.727838</td>\n", | |
" <td> NaN</td>\n", | |
" <td> 1.000000</td>\n", | |
" <td> 2002.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14000</th>\n", | |
" <td> 75.605080</td>\n", | |
" <td> 13.670722</td>\n", | |
" <td> 8.333333</td>\n", | |
" <td> 2009.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14001</th>\n", | |
" <td> 89.750454</td>\n", | |
" <td> 6.597126</td>\n", | |
" <td> 1.500000</td>\n", | |
" <td> 2009.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14002</th>\n", | |
" <td> 99.193081</td>\n", | |
" <td> 14.471929</td>\n", | |
" <td> 142.250000</td>\n", | |
" <td> 2004.500000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14003</th>\n", | |
" <td> 99.526737</td>\n", | |
" <td> 13.239124</td>\n", | |
" <td> 85.000000</td>\n", | |
" <td> 2005.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14004</th>\n", | |
" <td> 98.468027</td>\n", | |
" <td> 12.725488</td>\n", | |
" <td> 55.500000</td>\n", | |
" <td> 2004.500000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14005</th>\n", | |
" <td> 94.621077</td>\n", | |
" <td> 13.925231</td>\n", | |
" <td> 58.000000</td>\n", | |
" <td> 2003.500000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14006</th>\n", | |
" <td> 93.633615</td>\n", | |
" <td> 11.154024</td>\n", | |
" <td> 63.500000</td>\n", | |
" <td> 2003.500000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14007</th>\n", | |
" <td> 97.667551</td>\n", | |
" <td> 13.371636</td>\n", | |
" <td> 39.833333</td>\n", | |
" <td> 2004.500000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14008</th>\n", | |
" <td> 94.748298</td>\n", | |
" <td> 14.850266</td>\n", | |
" <td> 62.000000</td>\n", | |
" <td> 2005.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14009</th>\n", | |
" <td> 97.335682</td>\n", | |
" <td> 10.911686</td>\n", | |
" <td> 10.000000</td>\n", | |
" <td> 2004.333333</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14010</th>\n", | |
" <td> 95.231192</td>\n", | |
" <td> 8.960401</td>\n", | |
" <td> 10.000000</td>\n", | |
" <td> 2004.500000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14011</th>\n", | |
" <td> 100.356619</td>\n", | |
" <td> 19.207342</td>\n", | |
" <td> 17.000000</td>\n", | |
" <td> 2005.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14012</th>\n", | |
" <td> 80.561827</td>\n", | |
" <td> 10.335914</td>\n", | |
" <td> 18.000000</td>\n", | |
" <td> 2007.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14013</th>\n", | |
" <td> 98.086374</td>\n", | |
" <td> 12.546801</td>\n", | |
" <td> 38.357143</td>\n", | |
" <td> 2005.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14014</th>\n", | |
" <td> 97.783984</td>\n", | |
" <td> 15.183853</td>\n", | |
" <td> 48.500000</td>\n", | |
" <td> 2003.500000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14015</th>\n", | |
" <td> 97.300123</td>\n", | |
" <td> 14.128982</td>\n", | |
" <td> 116.642857</td>\n", | |
" <td> 2005.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14016</th>\n", | |
" <td> 94.530170</td>\n", | |
" <td> 14.071069</td>\n", | |
" <td> 118.400000</td>\n", | |
" <td> 2004.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14017</th>\n", | |
" <td> 91.811632</td>\n", | |
" <td> 12.742623</td>\n", | |
" <td> 76.666667</td>\n", | |
" <td> 2004.500000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14018</th>\n", | |
" <td> 101.850389</td>\n", | |
" <td> 16.012857</td>\n", | |
" <td> 40.250000</td>\n", | |
" <td> 2003.500000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14019</th>\n", | |
" <td> 95.985254</td>\n", | |
" <td> 12.897286</td>\n", | |
" <td> 34.571429</td>\n", | |
" <td> 2005.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14020</th>\n", | |
" <td> 73.859160</td>\n", | |
" <td> 9.628826</td>\n", | |
" <td> 31.333333</td>\n", | |
" <td> 2005.333333</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14021</th>\n", | |
" <td> 110.169410</td>\n", | |
" <td> 11.434753</td>\n", | |
" <td> 76.000000</td>\n", | |
" <td> 2006.500000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14022</th>\n", | |
" <td> 92.357969</td>\n", | |
" <td> 13.179575</td>\n", | |
" <td> 102.416667</td>\n", | |
" <td> 2004.500000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14024</th>\n", | |
" <td> 97.928798</td>\n", | |
" <td> 11.978615</td>\n", | |
" <td> 52.800000</td>\n", | |
" <td> 2004.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14025</th>\n", | |
" <td> 107.622472</td>\n", | |
" <td> 12.393003</td>\n", | |
" <td> 63.400000</td>\n", | |
" <td> 2004.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14026</th>\n", | |
" <td> 108.361917</td>\n", | |
" <td> 9.744963</td>\n", | |
" <td> 23.000000</td>\n", | |
" <td> 2005.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14027</th>\n", | |
" <td> 105.111340</td>\n", | |
" <td> 13.344660</td>\n", | |
" <td> 79.642857</td>\n", | |
" <td> 2005.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14028</th>\n", | |
" <td> 101.428645</td>\n", | |
" <td> 13.594276</td>\n", | |
" <td> 81.000000</td>\n", | |
" <td> 2005.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14029</th>\n", | |
" <td> 107.236395</td>\n", | |
" <td> 13.434021</td>\n", | |
" <td> 107.357143</td>\n", | |
" <td> 2005.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14030</th>\n", | |
" <td> 82.207932</td>\n", | |
" <td> 12.059516</td>\n", | |
" <td> 5.500000</td>\n", | |
" <td> 2004.833333</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14031</th>\n", | |
" <td> 80.166864</td>\n", | |
" <td> 10.897871</td>\n", | |
" <td> 20.000000</td>\n", | |
" <td> 2005.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14032</th>\n", | |
" <td> 97.507137</td>\n", | |
" <td> 14.603907</td>\n", | |
" <td> 34.500000</td>\n", | |
" <td> 2003.500000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14033</th>\n", | |
" <td> 97.216445</td>\n", | |
" <td> 13.810932</td>\n", | |
" <td> 45.888889</td>\n", | |
" <td> 2006.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14034</th>\n", | |
" <td> 77.809420</td>\n", | |
" <td> 8.784643</td>\n", | |
" <td> 9.576923</td>\n", | |
" <td> 2008.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14035</th>\n", | |
" <td> 74.037372</td>\n", | |
" <td> 8.453088</td>\n", | |
" <td> 21.000000</td>\n", | |
" <td> 2008.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14036</th>\n", | |
" <td> 79.874904</td>\n", | |
" <td> 9.662012</td>\n", | |
" <td> 8.416667</td>\n", | |
" <td> 2007.500000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14037</th>\n", | |
" <td> 112.491791</td>\n", | |
" <td> 5.928938</td>\n", | |
" <td> 8.000000</td>\n", | |
" <td> 2010.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14038</th>\n", | |
" <td> 98.139344</td>\n", | |
" <td> 12.469259</td>\n", | |
" <td> 16.000000</td>\n", | |
" <td> 2009.500000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14039</th>\n", | |
" <td> 103.330678</td>\n", | |
" <td> 14.218774</td>\n", | |
" <td> 55.444444</td>\n", | |
" <td> 2010.555556</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14040</th>\n", | |
" <td> 79.051284</td>\n", | |
" <td> 11.240789</td>\n", | |
" <td> 19.500000</td>\n", | |
" <td> 2012.200000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14041</th>\n", | |
" <td> 108.315297</td>\n", | |
" <td> 11.297068</td>\n", | |
" <td> 17.555556</td>\n", | |
" <td> 2010.555556</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14042</th>\n", | |
" <td> 100.826484</td>\n", | |
" <td> 14.875846</td>\n", | |
" <td> 239.071429</td>\n", | |
" <td> 2008.214286</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14043</th>\n", | |
" <td> 98.743504</td>\n", | |
" <td> 14.350353</td>\n", | |
" <td> 202.571429</td>\n", | |
" <td> 2008.214286</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14044</th>\n", | |
" <td> 96.488546</td>\n", | |
" <td> 13.571148</td>\n", | |
" <td> 125.214286</td>\n", | |
" <td> 2008.214286</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14045</th>\n", | |
" <td> 115.766330</td>\n", | |
" <td> 10.899575</td>\n", | |
" <td> 21.107143</td>\n", | |
" <td> 2008.214286</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14046</th>\n", | |
" <td> 92.666831</td>\n", | |
" <td> 4.827525</td>\n", | |
" <td> 2.000000</td>\n", | |
" <td> 2013.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14047</th>\n", | |
" <td> 96.600733</td>\n", | |
" <td> 14.101432</td>\n", | |
" <td> 74.714286</td>\n", | |
" <td> 2008.214286</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14048</th>\n", | |
" <td> 98.506472</td>\n", | |
" <td> 13.873354</td>\n", | |
" <td> 137.142857</td>\n", | |
" <td> 2008.214286</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14049</th>\n", | |
" <td> 93.349107</td>\n", | |
" <td> 13.056765</td>\n", | |
" <td> 43.714286</td>\n", | |
" <td> 2008.214286</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14050</th>\n", | |
" <td> 96.070019</td>\n", | |
" <td> 13.335813</td>\n", | |
" <td> 35.071429</td>\n", | |
" <td> 2008.214286</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14051</th>\n", | |
" <td> 98.398353</td>\n", | |
" <td> 13.541373</td>\n", | |
" <td> 68.642857</td>\n", | |
" <td> 2008.214286</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14052</th>\n", | |
" <td> 100.590638</td>\n", | |
" <td> 12.995011</td>\n", | |
" <td> 38.785714</td>\n", | |
" <td> 2008.214286</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14053</th>\n", | |
" <td> 93.900236</td>\n", | |
" <td> 14.552012</td>\n", | |
" <td> 101.357143</td>\n", | |
" <td> 2008.214286</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14054</th>\n", | |
" <td> 103.803544</td>\n", | |
" <td> 13.133332</td>\n", | |
" <td> 146.285714</td>\n", | |
" <td> 2008.214286</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14055</th>\n", | |
" <td> 94.186538</td>\n", | |
" <td> 13.612800</td>\n", | |
" <td> 85.214286</td>\n", | |
" <td> 2008.214286</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14057</th>\n", | |
" <td> 98.472108</td>\n", | |
" <td> 13.201418</td>\n", | |
" <td> 77.821429</td>\n", | |
" <td> 2008.214286</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14058</th>\n", | |
" <td> 105.893750</td>\n", | |
" <td> 12.430496</td>\n", | |
" <td> 10.392857</td>\n", | |
" <td> 2008.214286</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14059</th>\n", | |
" <td> 96.923023</td>\n", | |
" <td> 12.644721</td>\n", | |
" <td> 36.000000</td>\n", | |
" <td> 2011.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th></th>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" <td>...</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>7853 rows \u00d7 4 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 51, | |
"text": [ | |
" mean_score std_dev no_of_students year\n", | |
"school_id \n", | |
"13997 87.168595 9.749717 16.000000 2005.000000\n", | |
"13999 92.727838 NaN 1.000000 2002.000000\n", | |
"14000 75.605080 13.670722 8.333333 2009.000000\n", | |
"14001 89.750454 6.597126 1.500000 2009.000000\n", | |
"14002 99.193081 14.471929 142.250000 2004.500000\n", | |
"14003 99.526737 13.239124 85.000000 2005.000000\n", | |
"14004 98.468027 12.725488 55.500000 2004.500000\n", | |
"14005 94.621077 13.925231 58.000000 2003.500000\n", | |
"14006 93.633615 11.154024 63.500000 2003.500000\n", | |
"14007 97.667551 13.371636 39.833333 2004.500000\n", | |
"14008 94.748298 14.850266 62.000000 2005.000000\n", | |
"14009 97.335682 10.911686 10.000000 2004.333333\n", | |
"14010 95.231192 8.960401 10.000000 2004.500000\n", | |
"14011 100.356619 19.207342 17.000000 2005.000000\n", | |
"14012 80.561827 10.335914 18.000000 2007.000000\n", | |
"14013 98.086374 12.546801 38.357143 2005.000000\n", | |
"14014 97.783984 15.183853 48.500000 2003.500000\n", | |
"14015 97.300123 14.128982 116.642857 2005.000000\n", | |
"14016 94.530170 14.071069 118.400000 2004.000000\n", | |
"14017 91.811632 12.742623 76.666667 2004.500000\n", | |
"14018 101.850389 16.012857 40.250000 2003.500000\n", | |
"14019 95.985254 12.897286 34.571429 2005.000000\n", | |
"14020 73.859160 9.628826 31.333333 2005.333333\n", | |
"14021 110.169410 11.434753 76.000000 2006.500000\n", | |
"14022 92.357969 13.179575 102.416667 2004.500000\n", | |
"14024 97.928798 11.978615 52.800000 2004.000000\n", | |
"14025 107.622472 12.393003 63.400000 2004.000000\n", | |
"14026 108.361917 9.744963 23.000000 2005.000000\n", | |
"14027 105.111340 13.344660 79.642857 2005.000000\n", | |
"14028 101.428645 13.594276 81.000000 2005.000000\n", | |
"14029 107.236395 13.434021 107.357143 2005.000000\n", | |
"14030 82.207932 12.059516 5.500000 2004.833333\n", | |
"14031 80.166864 10.897871 20.000000 2005.000000\n", | |
"14032 97.507137 14.603907 34.500000 2003.500000\n", | |
"14033 97.216445 13.810932 45.888889 2006.000000\n", | |
"14034 77.809420 8.784643 9.576923 2008.000000\n", | |
"14035 74.037372 8.453088 21.000000 2008.000000\n", | |
"14036 79.874904 9.662012 8.416667 2007.500000\n", | |
"14037 112.491791 5.928938 8.000000 2010.000000\n", | |
"14038 98.139344 12.469259 16.000000 2009.500000\n", | |
"14039 103.330678 14.218774 55.444444 2010.555556\n", | |
"14040 79.051284 11.240789 19.500000 2012.200000\n", | |
"14041 108.315297 11.297068 17.555556 2010.555556\n", | |
"14042 100.826484 14.875846 239.071429 2008.214286\n", | |
"14043 98.743504 14.350353 202.571429 2008.214286\n", | |
"14044 96.488546 13.571148 125.214286 2008.214286\n", | |
"14045 115.766330 10.899575 21.107143 2008.214286\n", | |
"14046 92.666831 4.827525 2.000000 2013.000000\n", | |
"14047 96.600733 14.101432 74.714286 2008.214286\n", | |
"14048 98.506472 13.873354 137.142857 2008.214286\n", | |
"14049 93.349107 13.056765 43.714286 2008.214286\n", | |
"14050 96.070019 13.335813 35.071429 2008.214286\n", | |
"14051 98.398353 13.541373 68.642857 2008.214286\n", | |
"14052 100.590638 12.995011 38.785714 2008.214286\n", | |
"14053 93.900236 14.552012 101.357143 2008.214286\n", | |
"14054 103.803544 13.133332 146.285714 2008.214286\n", | |
"14055 94.186538 13.612800 85.214286 2008.214286\n", | |
"14057 98.472108 13.201418 77.821429 2008.214286\n", | |
"14058 105.893750 12.430496 10.392857 2008.214286\n", | |
"14059 96.923023 12.644721 36.000000 2011.000000\n", | |
" ... ... ... ...\n", | |
"\n", | |
"[7853 rows x 4 columns]" | |
] | |
} | |
], | |
"prompt_number": 51 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Comments\n", | |
"\n", | |
"When importing data, be careful to remember to distinguish `str`/`uniode` from `int`.\n", | |
"\n", | |
"E.g. `JSON` format can have only strings as keys, and it is easy to get nasty bug.\n", | |
"\n", | |
"Big data warning: if it fits your memory, you are better off doing it locally that doing.\n", | |
"\n", | |
"I also recommend [this tutorial on Pandas](http://manishamde.github.io/blog/2013/03/07/pandas-and-python-top-10/)." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## by Piotr Migda\u0142\n", | |
"\n", | |
"finishing PhD in quantum optics at ICFO - The Institute of Photonic Sciences, Castelldefels (Barcelona) \n", | |
"\n", | |
"pmigdal@gmail.com, http://migdal.wikidot.com\n", | |
"\n", | |
"I freelance in **data analysis** (Python!) and **interactive visualizations** (D3.js). See examples:\n", | |
"\n", | |
"* [Maps of concepts: programming and not only!](https://github.com/stared/tag-graph-map-of-stackexchange/wiki)\n", | |
"* [Book themes visualized](http://stared.github.io/wizualizacja-wolnych-lektur/polish_books_themes.html)\n", | |
"* [TagOverflow](http://stared.github.io/tagoverflow/)\n", | |
"* [Szko\u0142omat - schools in Poland](http://szkolomat.pl/) (in Polish; data analysis mostly in Pandas)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"BTW: If you do quantum stuff, check [QuTiP: Quantum Toolbox in Python](http://qutip.org/). " | |
] | |
} | |
], | |
"metadata": {} | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment