Skip to content

Instantly share code, notes, and snippets.

@TomAugspurger
Created March 23, 2016 01:22
Show Gist options
  • Star 11 You must be signed in to star a gist
  • Fork 10 You must be signed in to fork a gist
  • Save TomAugspurger/83213b5f7b21dbb52002 to your computer and use it in GitHub Desktop.
Save TomAugspurger/83213b5f7b21dbb52002 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"My intented reader is someone who has experience with pandas. This is not an introduction to pandas. There are many great resources for that. In particular, I recommend [Greg Reda](https://twitter.com/gjreda)'s [3-part introduction](http://gregreda.com/2013/10/26/intro-to-pandas-data-structures/), especially if you're familiar with SQL. Of course, there's the pandas [documentation](http://pandas.pydata.org/) itself. I gave [a talk](https://www.youtube.com/watch?v=otCriSKVV_8) at PyData Seattle targeted as an introduction. Wes McKinney's [Python for Data Analysis](http://shop.oreilly.com/product/0636920023784.do) is still the goto book. Jake VanderPlas's [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do), in early release, is great too.\n",
"\n",
"With all those resources (and many more that I've slighted by omitting), why write another? Surely the law of dimenishing returns is kicking in by now.\n",
"Still, I thought there was room for a guide that is update to date (as of March 2016) and emphasizes idiomatic pandas code (code that is *pandorable*).\n",
"\n",
"We'll be working with [flight delay data](http://www.transtats.bts.gov/databases.asp?Mode_ID=1&Mode_Desc=Aviation&Subject_ID2=0) from the BTS (R users can install Hadley's [NYCFlights](https://github.com/hadley/nycflights13) dataset for similar data).\n",
"\n",
"Protip: In the chrome dev-tools go to the Network tab, right-click the request and\n",
"\"Copy as cURL\". Paste that into http://curl.trillworks.com/ to get the request code."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import zipfile\n",
"\n",
"import requests\n",
"import numpy as np\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"headers = {\n",
" 'Pragma': 'no-cache',\n",
" 'Origin': 'http://www.transtats.bts.gov',\n",
" 'Accept-Encoding': 'gzip, deflate',\n",
" 'Accept-Language': 'en-US,en;q=0.8',\n",
" 'Upgrade-Insecure-Requests': '1',\n",
" 'User-Agent': ('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) '\n",
" 'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36'),\n",
" 'Content-Type': 'application/x-www-form-urlencoded',\n",
" 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',\n",
" 'Cache-Control': 'no-cache',\n",
" 'Referer': 'http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time',\n",
" 'Connection': 'keep-alive',\n",
" 'DNT': '1',\n",
"}\n",
"# Sorry about the long url\n",
"data = 'UserTableName=On_Time_Performance&DBShortName=On_Time&RawDataTable=T_ONTIME&sqlstr=+SELECT+FL_DATE%2CUNIQUE_CARRIER%2CAIRLINE_ID%2CTAIL_NUM%2CFL_NUM%2CORIGIN_AIRPORT_ID%2CORIGIN_AIRPORT_SEQ_ID%2CORIGIN_CITY_MARKET_ID%2CORIGIN%2CORIGIN_CITY_NAME%2CORIGIN_STATE_NM%2CDEST_AIRPORT_ID%2CDEST_AIRPORT_SEQ_ID%2CDEST_CITY_MARKET_ID%2CDEST%2CDEST_CITY_NAME%2CDEST_STATE_NM%2CCRS_DEP_TIME%2CDEP_TIME%2CDEP_DELAY%2CTAXI_OUT%2CWHEELS_OFF%2CWHEELS_ON%2CTAXI_IN%2CCRS_ARR_TIME%2CARR_TIME%2CARR_DELAY%2CCANCELLED%2CCANCELLATION_CODE%2CDIVERTED%2CDISTANCE%2CCARRIER_DELAY%2CWEATHER_DELAY%2CNAS_DELAY%2CSECURITY_DELAY%2CLATE_AIRCRAFT_DELAY+FROM++T_ONTIME+WHERE+Month+%3D1+AND+YEAR%3D2014&varlist=FL_DATE%2CUNIQUE_CARRIER%2CAIRLINE_ID%2CTAIL_NUM%2CFL_NUM%2CORIGIN_AIRPORT_ID%2CORIGIN_AIRPORT_SEQ_ID%2CORIGIN_CITY_MARKET_ID%2CORIGIN%2CORIGIN_CITY_NAME%2CORIGIN_STATE_NM%2CDEST_AIRPORT_ID%2CDEST_AIRPORT_SEQ_ID%2CDEST_CITY_MARKET_ID%2CDEST%2CDEST_CITY_NAME%2CDEST_STATE_NM%2CCRS_DEP_TIME%2CDEP_TIME%2CDEP_DELAY%2CTAXI_OUT%2CWHEELS_OFF%2CWHEELS_ON%2CTAXI_IN%2CCRS_ARR_TIME%2CARR_TIME%2CARR_DELAY%2CCANCELLED%2CCANCELLATION_CODE%2CDIVERTED%2CDISTANCE%2CCARRIER_DELAY%2CWEATHER_DELAY%2CNAS_DELAY%2CSECURITY_DELAY%2CLATE_AIRCRAFT_DELAY&grouplist=&suml=&sumRegion=&filter1=title%3D&filter2=title%3D&geo=All%A0&time=January&timename=Month&GEOGRAPHY=All&XYEAR=2014&FREQUENCY=1&VarDesc=Year&VarType=Num&VarDesc=Quarter&VarType=Num&VarDesc=Month&VarType=Num&VarDesc=DayofMonth&VarType=Num&VarDesc=DayOfWeek&VarType=Num&VarName=FL_DATE&VarDesc=FlightDate&VarType=Char&VarName=UNIQUE_CARRIER&VarDesc=UniqueCarrier&VarType=Char&VarName=AIRLINE_ID&VarDesc=AirlineID&VarType=Num&VarDesc=Carrier&VarType=Char&VarName=TAIL_NUM&VarDesc=TailNum&VarType=Char&VarName=FL_NUM&VarDesc=FlightNum&VarType=Char&VarName=ORIGIN_AIRPORT_ID&VarDesc=OriginAirportID&VarType=Num&VarName=ORIGIN_AIRPORT_SEQ_ID&VarDesc=OriginAirportSeqID&VarType=Num&VarName=ORIGIN_CITY_MARKET_ID&VarDesc=OriginCityMarketID&VarType=Num&VarName=ORIGIN&VarDesc=Origin&VarType=Char&VarName=ORIGIN_CITY_NAME&VarDesc=OriginCityName&VarType=Char&VarDesc=OriginState&VarType=Char&VarDesc=OriginStateFips&VarType=Char&VarName=ORIGIN_STATE_NM&VarDesc=OriginStateName&VarType=Char&VarDesc=OriginWac&VarType=Num&VarName=DEST_AIRPORT_ID&VarDesc=DestAirportID&VarType=Num&VarName=DEST_AIRPORT_SEQ_ID&VarDesc=DestAirportSeqID&VarType=Num&VarName=DEST_CITY_MARKET_ID&VarDesc=DestCityMarketID&VarType=Num&VarName=DEST&VarDesc=Dest&VarType=Char&VarName=DEST_CITY_NAME&VarDesc=DestCityName&VarType=Char&VarDesc=DestState&VarType=Char&VarDesc=DestStateFips&VarType=Char&VarName=DEST_STATE_NM&VarDesc=DestStateName&VarType=Char&VarDesc=DestWac&VarType=Num&VarName=CRS_DEP_TIME&VarDesc=CRSDepTime&VarType=Char&VarName=DEP_TIME&VarDesc=DepTime&VarType=Char&VarName=DEP_DELAY&VarDesc=DepDelay&VarType=Num&VarDesc=DepDelayMinutes&VarType=Num&VarDesc=DepDel15&VarType=Num&VarDesc=DepartureDelayGroups&VarType=Num&VarDesc=DepTimeBlk&VarType=Char&VarName=TAXI_OUT&VarDesc=TaxiOut&VarType=Num&VarName=WHEELS_OFF&VarDesc=WheelsOff&VarType=Char&VarName=WHEELS_ON&VarDesc=WheelsOn&VarType=Char&VarName=TAXI_IN&VarDesc=TaxiIn&VarType=Num&VarName=CRS_ARR_TIME&VarDesc=CRSArrTime&VarType=Char&VarName=ARR_TIME&VarDesc=ArrTime&VarType=Char&VarName=ARR_DELAY&VarDesc=ArrDelay&VarType=Num&VarDesc=ArrDelayMinutes&VarType=Num&VarDesc=ArrDel15&VarType=Num&VarDesc=ArrivalDelayGroups&VarType=Num&VarDesc=ArrTimeBlk&VarType=Char&VarName=CANCELLED&VarDesc=Cancelled&VarType=Num&VarName=CANCELLATION_CODE&VarDesc=CancellationCode&VarType=Char&VarName=DIVERTED&VarDesc=Diverted&VarType=Num&VarDesc=CRSElapsedTime&VarType=Num&VarDesc=ActualElapsedTime&VarType=Num&VarDesc=AirTime&VarType=Num&VarDesc=Flights&VarType=Num&VarName=DISTANCE&VarDesc=Distance&VarType=Num&VarDesc=DistanceGroup&VarType=Num&VarName=CARRIER_DELAY&VarDesc=CarrierDelay&VarType=Num&VarName=WEATHER_DELAY&VarDesc=WeatherDelay&VarType=Num&VarName=NAS_DELAY&VarDesc=NASDelay&VarType=Num&VarName=SECURITY_DELAY&VarDesc=SecurityDelay&VarType=Num&VarName=LATE_AIRCRAFT_DELAY&VarDesc=LateAircraftDelay&VarType=Num&VarDesc=FirstDepTime&VarType=Char&VarDesc=TotalAddGTime&VarType=Num&VarDesc=LongestAddGTime&VarType=Num&VarDesc=DivAirportLandings&VarType=Num&VarDesc=DivReachedDest&VarType=Num&VarDesc=DivActualElapsedTime&VarType=Num&VarDesc=DivArrDelay&VarType=Num&VarDesc=DivDistance&VarType=Num&VarDesc=Div1Airport&VarType=Char&VarDesc=Div1AirportID&VarType=Num&VarDesc=Div1AirportSeqID&VarType=Num&VarDesc=Div1WheelsOn&VarType=Char&VarDesc=Div1TotalGTime&VarType=Num&VarDesc=Div1LongestGTime&VarType=Num&VarDesc=Div1WheelsOff&VarType=Char&VarDesc=Div1TailNum&VarType=Char&VarDesc=Div2Airport&VarType=Char&VarDesc=Div2AirportID&VarType=Num&VarDesc=Div2AirportSeqID&VarType=Num&VarDesc=Div2WheelsOn&VarType=Char&VarDesc=Div2TotalGTime&VarType=Num&VarDesc=Div2LongestGTime&VarType=Num&VarDesc=Div2WheelsOff&VarType=Char&VarDesc=Div2TailNum&VarType=Char&VarDesc=Div3Airport&VarType=Char&VarDesc=Div3AirportID&VarType=Num&VarDesc=Div3AirportSeqID&VarType=Num&VarDesc=Div3WheelsOn&VarType=Char&VarDesc=Div3TotalGTime&VarType=Num&VarDesc=Div3LongestGTime&VarType=Num&VarDesc=Div3WheelsOff&VarType=Char&VarDesc=Div3TailNum&VarType=Char&VarDesc=Div4Airport&VarType=Char&VarDesc=Div4AirportID&VarType=Num&VarDesc=Div4AirportSeqID&VarType=Num&VarDesc=Div4WheelsOn&VarType=Char&VarDesc=Div4TotalGTime&VarType=Num&VarDesc=Div4LongestGTime&VarType=Num&VarDesc=Div4WheelsOff&VarType=Char&VarDesc=Div4TailNum&VarType=Char&VarDesc=Div5Airport&VarType=Char&VarDesc=Div5AirportID&VarType=Num&VarDesc=Div5AirportSeqID&VarType=Num&VarDesc=Div5WheelsOn&VarType=Char&VarDesc=Div5TotalGTime&VarType=Num&VarDesc=Div5LongestGTime&VarType=Num&VarDesc=Div5WheelsOff&VarType=Char&VarDesc=Div5TailNum&VarType=Char'\n",
"\n",
"r = requests.post('http://www.transtats.bts.gov/DownLoad_Table.asp?Table_ID=236&Has_Group=3&Is_Zipped=0',\n",
" headers=headers, data=data, stream=True)\n",
"\n",
"with open(\"flights.csv\", 'wb') as f:\n",
" for chunk in r.iter_content(chunk_size=1024): \n",
" if chunk:\n",
" f.write(chunk)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"zf = zipfile.ZipFile(\"flights.csv.zip\")\n",
"filename = zf.filelist[0].filename\n",
"fp = zf.extract(filename)\n",
"df = pd.read_csv(fp, parse_dates=\"FL_DATE\").rename(columns=str.lower)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"0\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>fl_date</th>\n",
" <th>unique_carrier</th>\n",
" <th>airline_id</th>\n",
" <th>tail_num</th>\n",
" <th>fl_num</th>\n",
" <th>origin_airport_id</th>\n",
" <th>origin_airport_seq_id</th>\n",
" <th>origin_city_market_id</th>\n",
" <th>origin</th>\n",
" <th>origin_city_name</th>\n",
" <th>...</th>\n",
" <th>cancelled</th>\n",
" <th>cancellation_code</th>\n",
" <th>diverted</th>\n",
" <th>distance</th>\n",
" <th>carrier_delay</th>\n",
" <th>weather_delay</th>\n",
" <th>nas_delay</th>\n",
" <th>security_delay</th>\n",
" <th>late_aircraft_delay</th>\n",
" <th>unnamed: 36</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2014-01-01</td>\n",
" <td>AA</td>\n",
" <td>19805</td>\n",
" <td>N338AA</td>\n",
" <td>1</td>\n",
" <td>12478</td>\n",
" <td>1247802</td>\n",
" <td>31703</td>\n",
" <td>JFK</td>\n",
" <td>New York, NY</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>2475.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2014-01-01</td>\n",
" <td>AA</td>\n",
" <td>19805</td>\n",
" <td>N339AA</td>\n",
" <td>2</td>\n",
" <td>12892</td>\n",
" <td>1289203</td>\n",
" <td>32575</td>\n",
" <td>LAX</td>\n",
" <td>Los Angeles, CA</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>2475.0</td>\n",
" <td>111.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2014-01-01</td>\n",
" <td>AA</td>\n",
" <td>19805</td>\n",
" <td>N335AA</td>\n",
" <td>3</td>\n",
" <td>12478</td>\n",
" <td>1247802</td>\n",
" <td>31703</td>\n",
" <td>JFK</td>\n",
" <td>New York, NY</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>2475.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2014-01-01</td>\n",
" <td>AA</td>\n",
" <td>19805</td>\n",
" <td>N367AA</td>\n",
" <td>5</td>\n",
" <td>11298</td>\n",
" <td>1129803</td>\n",
" <td>30194</td>\n",
" <td>DFW</td>\n",
" <td>Dallas/Fort Worth, TX</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>3784.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2014-01-01</td>\n",
" <td>AA</td>\n",
" <td>19805</td>\n",
" <td>N364AA</td>\n",
" <td>6</td>\n",
" <td>13830</td>\n",
" <td>1383002</td>\n",
" <td>33830</td>\n",
" <td>OGG</td>\n",
" <td>Kahului, HI</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>3711.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 37 columns</p>\n",
"</div>"
],
"text/plain": [
" fl_date unique_carrier airline_id tail_num fl_num origin_airport_id \\\n",
"0 2014-01-01 AA 19805 N338AA 1 12478 \n",
"1 2014-01-01 AA 19805 N339AA 2 12892 \n",
"2 2014-01-01 AA 19805 N335AA 3 12478 \n",
"3 2014-01-01 AA 19805 N367AA 5 11298 \n",
"4 2014-01-01 AA 19805 N364AA 6 13830 \n",
"\n",
" origin_airport_seq_id origin_city_market_id origin origin_city_name \\\n",
"0 1247802 31703 JFK New York, NY \n",
"1 1289203 32575 LAX Los Angeles, CA \n",
"2 1247802 31703 JFK New York, NY \n",
"3 1129803 30194 DFW Dallas/Fort Worth, TX \n",
"4 1383002 33830 OGG Kahului, HI \n",
"\n",
" ... cancelled cancellation_code diverted distance carrier_delay \\\n",
"0 ... 0.0 NaN 0.0 2475.0 NaN \n",
"1 ... 0.0 NaN 0.0 2475.0 111.0 \n",
"2 ... 0.0 NaN 0.0 2475.0 NaN \n",
"3 ... 0.0 NaN 0.0 3784.0 NaN \n",
"4 ... 0.0 NaN 0.0 3711.0 NaN \n",
"\n",
" weather_delay nas_delay security_delay late_aircraft_delay unnamed: 36 \n",
"0 NaN NaN NaN NaN NaN \n",
"1 0.0 0.0 0.0 0.0 NaN \n",
"2 NaN NaN NaN NaN NaN \n",
"3 NaN NaN NaN NaN NaN \n",
"4 NaN NaN NaN NaN NaN \n",
"\n",
"[5 rows x 37 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Slicing\n",
"\n",
"Or, *explicit is better than implicit*.\n",
"\n",
"By my count, 7 of the top-15 voted pandas questions on [Stackoverflow](http://stackoverflow.com/questions/tagged/pandas?sort=votes&pageSize=15) are about slicing. This seems as good a place as any to start.\n",
"\n",
"Brief history digression: For years the preferred method for row and/or column selection was `.ix`."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"0\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>fl_date</th>\n",
" <th>tail_num</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>2014-01-01</td>\n",
" <td>N3LGAA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>2014-01-01</td>\n",
" <td>N368AA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>2014-01-01</td>\n",
" <td>N3DDAA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>2014-01-01</td>\n",
" <td>N332AA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>2014-01-01</td>\n",
" <td>N327AA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>2014-01-01</td>\n",
" <td>N3LBAA</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" fl_date tail_num\n",
"10 2014-01-01 N3LGAA\n",
"11 2014-01-01 N368AA\n",
"12 2014-01-01 N3DDAA\n",
"13 2014-01-01 N332AA\n",
"14 2014-01-01 N327AA\n",
"15 2014-01-01 N3LBAA"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.ix[10:15, ['fl_date', 'tail_num']]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However this simple little operation hides some complexity. What if, rather than our default `range(n)` index, we had an Integer index like"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"0\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>fl_date</th>\n",
" <th>unique_carrier</th>\n",
" </tr>\n",
" <tr>\n",
" <th>airline_id</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>19393</th>\n",
" <td>2014-01-01</td>\n",
" <td>WN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19690</th>\n",
" <td>2014-01-01</td>\n",
" <td>HA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19790</th>\n",
" <td>2014-01-01</td>\n",
" <td>DL</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19805</th>\n",
" <td>2014-01-01</td>\n",
" <td>AA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19930</th>\n",
" <td>2014-01-01</td>\n",
" <td>AS</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" fl_date unique_carrier\n",
"airline_id \n",
"19393 2014-01-01 WN\n",
"19690 2014-01-01 HA\n",
"19790 2014-01-01 DL\n",
"19805 2014-01-01 AA\n",
"19930 2014-01-01 AS"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"first = df.groupby('airline_id')[['fl_date', 'unique_carrier']].first()\n",
"first.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Can you predict ahead of time what our slice from above will give when passed to `.ix`?"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"0\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>fl_date</th>\n",
" <th>tail_num</th>\n",
" </tr>\n",
" <tr>\n",
" <th>airline_id</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Empty DataFrame\n",
"Columns: [fl_date, tail_num]\n",
"Index: []"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"first.ix[10:15, ['fl_date', 'tail_num']]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We had an integer index, so the call to `.ix` used it's label-based mode. It was looking for integer *labels* between 10:15 (inclusive). It didn't find any. Since we sliced a range it returned an empty DataFrame, rather than raising a KeyError.\n",
"\n",
"By way of contrast, suppose we had a string index, rather than integers."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"0\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>fl_date</th>\n",
" <th>tail_num</th>\n",
" </tr>\n",
" <tr>\n",
" <th>unique_carrier</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>UA</th>\n",
" <td>2014-01-01</td>\n",
" <td>N14214</td>\n",
" </tr>\n",
" <tr>\n",
" <th>US</th>\n",
" <td>2014-01-01</td>\n",
" <td>N650AW</td>\n",
" </tr>\n",
" <tr>\n",
" <th>VX</th>\n",
" <td>2014-01-01</td>\n",
" <td>N637VA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>WN</th>\n",
" <td>2014-01-01</td>\n",
" <td>N412WN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" fl_date tail_num\n",
"unique_carrier \n",
"UA 2014-01-01 N14214\n",
"US 2014-01-01 N650AW\n",
"VX 2014-01-01 N637VA\n",
"WN 2014-01-01 N412WN"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"first = df.groupby('unique_carrier').first()\n",
"first.ix[10:15, ['fl_date', 'tail_num']]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And it works again! Now that we had a string index, `.ix` used it's positional-mode. It looked for *rows* 10-15 (exclusive on the right).\n",
"\n",
"But you can't reliably predict what the outcome of the slice will be ahead of time. It's on the *reader* of the code (probably your future self) to know the dtypes so you can reckon whether `.ix` will use positional indexing or label indexing (like the last example).\n",
"\n",
"Since pandas 0.12, these tasks have been cleanly separated into two methods:\n",
"\n",
"1. `.loc` for label-based indexing\n",
"2. `.iloc` for positional indexing"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"0\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>fl_date</th>\n",
" <th>tail_num</th>\n",
" </tr>\n",
" <tr>\n",
" <th>unique_carrier</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>AA</th>\n",
" <td>2014-01-01</td>\n",
" <td>N338AA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>AS</th>\n",
" <td>2014-01-01</td>\n",
" <td>N524AS</td>\n",
" </tr>\n",
" <tr>\n",
" <th>DL</th>\n",
" <td>2014-01-01</td>\n",
" <td>N911DL</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" fl_date tail_num\n",
"unique_carrier \n",
"AA 2014-01-01 N338AA\n",
"AS 2014-01-01 N524AS\n",
"DL 2014-01-01 N911DL"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"first.loc[['AA', 'AS', 'DL'], ['fl_date', 'tail_num']]"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"0\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>fl_date</th>\n",
" <th>airline_id</th>\n",
" </tr>\n",
" <tr>\n",
" <th>unique_carrier</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>AA</th>\n",
" <td>2014-01-01</td>\n",
" <td>19805</td>\n",
" </tr>\n",
" <tr>\n",
" <th>AS</th>\n",
" <td>2014-01-01</td>\n",
" <td>19930</td>\n",
" </tr>\n",
" <tr>\n",
" <th>DL</th>\n",
" <td>2014-01-01</td>\n",
" <td>19790</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" fl_date airline_id\n",
"unique_carrier \n",
"AA 2014-01-01 19805\n",
"AS 2014-01-01 19930\n",
"DL 2014-01-01 19790"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"first.iloc[[0, 1, 3], [0, 1]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`.ix` is still around, and isn't being deprecated any time soon. Occasionally it's useful. But if you've been using `.ix` out of habit, or if you didn't know any better, maybe give `.loc` and `.iloc` a shot. For the intrepid reader, Joris Van den Bossche (a core pandas dev) [compiled a great overview](https://github.com/pydata/pandas/issues/9595) of the pandas `__getitem__` API. To quote him:\n",
"\n",
"> Conclusion: it is mess :-)\n",
"\n",
"There are unfortunately edge cases I omitted here (many related to `.ix`, which you're done using, right?). This is a surprising difficult topic to get right, and it's so fundamental that breaking API isn't really practical. But if you dilligently use `.loc` and `.iloc`, you'll avoid most of the edge cases."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## SettingWithCopy\n",
"\n",
"*The warning we all love to hate.*\n",
"\n",
"Pandas used to get *a lot* of questions about assignments seemingly not working. We'll take [this StackOverflow](http://stackoverflow.com/q/16553298/1889400) question as a representative question."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"0\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>50</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a b\n",
"0 1 10\n",
"1 2 20\n",
"2 3 30\n",
"3 4 40\n",
"4 5 50"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"f = pd.DataFrame({'a':[1,2,3,4,5], 'b':[10,20,30,40,50]})\n",
"f"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The user wanted to take the rows of `b` where `a` was 3 or less, and set them equal to `b / 10`"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"0\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>50</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a b\n",
"0 1 10\n",
"1 2 20\n",
"2 3 30\n",
"3 4 40\n",
"4 5 50"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# ignore the context manager for now\n",
"with pd.option_context('mode.chained_assignment', None):\n",
" f[f['a'] <= 3]['b'] = f[f['a'] <= 3 ]['b'] / 10\n",
"f"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And nothing happend. Well, something did happen, but nobody witnessed it. If an object without any references is modified, does it make a sound?\n",
"\n",
"The warning I silenced above links to [an explanation](http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy) that's quite helpful. I'll summarize the high points here.\n",
"\n",
"This comes down to what's called *chained indexing*. Above we made two calls on the left-hand side, one `__getitem__` and one `__setitem__`.\n",
"\n",
"1. `f[f['a'] <= 3]`\n",
"2. `_['b']` # using `_` to represent the result of 1.\n",
"\n",
"In general, pandas can't guarantee whether that first `__getitem__` returns a view or a copy of the underlying data. And so we can't be sure that we actually have a reference to the object having `__setitem__` called on it.\n",
"\n",
"Done properly, you would write"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"0\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>3.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>40.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>50.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a b\n",
"0 1 1.0\n",
"1 2 2.0\n",
"2 3 3.0\n",
"3 4 40.0\n",
"4 5 50.0"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"f.loc[f['a'] <= 3, 'b'] = f.loc[f['a'] <= 3, 'b'] / 10\n",
"f"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now this is all in a single call to `__setitem__` and pandas can ensure that the assignment happens properly.\n",
"\n",
"The rough rule is any time you see back-to-back square brackets, `][`, you're in asking for trouble. Replace that with a `.loc[..., ...]` and you'll be set.\n",
"\n",
"The other bit of advice is that a SettingWithCopy warning is raised when the *assignment* is made. The potential copy could be made earlier in your code."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Vectorization\n",
"\n",
"DataFrame appends are relatively expensive. Keep in mind that indexes are immutable, so each time you append pandas has to create an entirely new index.\n",
"\n",
"In the next section, we'll download a bunch of weather files, one per state, writing each to a separate CSV. One could imagine coming back later to read them in, using the following code."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The idiomatic python way\n",
"\n",
"```python\n",
"files = glob.glob('weather/*.csv')\n",
"columns = ['station', 'date', 'tmpf', 'relh', 'sped', 'mslp',\n",
" 'p01i', 'vsby', 'gust_mph', 'skyc1', 'skyc2', 'skyc3']\n",
"\n",
"# init empty DataFrame, like you might for a list\n",
"weather = pd.DataFrame(columns=columns)\n",
"\n",
"for fp in files:\n",
" city = pd.read_csv(fp, columns=columns)\n",
" weather.append(df)\n",
"```\n",
"\n",
"The only nitpick is that you'd probably use a list-comprehension if you were just making a list. But pandas doesn't have access to that syntax for DataFrame-comprehension (if only), so you'd fall back to the \"intitilize empty container, append to said container\" pattern.\n",
"\n",
"But, there's a better, pandorable, way\n",
"\n",
"```python\n",
"files = glob.glob('weather/*.csv')\n",
"weather_dfs = [pd.read_csv(fp, names=columns) for fp in files]\n",
"weather = pd.concat(weather_dfs)\n",
"```\n",
"\n",
"Subjectively this is cleaner and more beautiful.\n",
"There's fewer lines of code, and the initializtion of an empty data structure says more about *how* to achieve the real task of building a data structure.\n",
"Objectively the pandorable way is faster, as we'll test next.\n",
"We'll define two functions for building an identical DataFrame. The first `append_df`, creates an empty dataframe and appends to it. The second, `concat_df`, creates many DataFrames, and concatenates them at the end. We also write a short decorator that runs the functions a handful of times and records the results."
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import time\n",
"\n",
"size_per = 1000\n",
"N = 100\n",
"cols = list('abcd')\n",
"\n",
"def timed(n=30):\n",
" '''\n",
" Running a microbenchmark. Not especially important.\n",
" '''\n",
" def deco(func):\n",
" def wrapper(*args, **kwargs):\n",
" timings = []\n",
" for i in range(n):\n",
" t0 = time.time()\n",
" func(*args, **kwargs)\n",
" t1 = time.time()\n",
" timings.append(t1 - t0)\n",
" return timings\n",
" return wrapper\n",
" return deco\n",
" \n",
"@timed(30)\n",
"def append_df():\n",
" '''\n",
" The pythonic (bad) way\n",
" '''\n",
" df = pd.DataFrame(columns=cols)\n",
" for _ in range(N):\n",
" df.append(pd.DataFrame(np.random.randn(size_per, 4), columns=cols))\n",
" return df\n",
"\n",
"@timed(30)\n",
"def concat_df():\n",
" '''\n",
" The pandorabe (good) way\n",
" '''\n",
" dfs = [pd.DataFrame(np.random.randn(size_per, 4), columns=cols)\n",
" for _ in range(N)]\n",
" return pd.concat(dfs, ignore_index=True)"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"t_append = append_df()\n",
"t_concat = concat_df()\n",
"\n",
"timings = pd.DataFrame({\"append (🐍)\": t_append, \"concat (🐼)\": t_concat})"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"\n",
"sns.set_style('white')\n",
"sns.set_context('talk')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAuEAAAH7CAYAAABi5UHyAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3Xlc1VX+x/H35cJlEQRBxAVFlknLFdww0TRqnMo0UrMx\nmyWbin41NY0tTjVSTovNlNMytjjTlA3+ytHyl5ZWOqVZ4ppL4g6UGC4IisiFy3J/f5jUHRYvCOci\nvp5/yed77vl+bo/AN8dzz9fidDqdAgAAAGCMl6cbAAAAAC40hHAAAADAMEI4AAAAYBghHAAAADCM\nEA4AAAAYRggHAAAADDMewjMzMzVx4kTFx8crJSVFW7durXXc888/r+HDh2vAgAH65S9/qX379jV4\nDgAAAKAlMhrCHQ6HUlNTNWHCBG3cuFFTpkxRamqq7Ha7y7h///vf+uSTT/Tuu+9q06ZNGjBggB54\n4IEGzQEAAAC0VEZDeEZGhqxWqyZNmiSr1arx48crLCxMq1atchk3ceJELVy4UOHh4SouLlZRUZFC\nQ0MlSWvXrnVrDgAAAKCl8jZ5s6ysLMXGxrrUoqOjlZWVVWOsn5+f3nvvPf3hD39QUFCQXn/9dUlS\ndna223MAAAAALZHRlXC73S5/f3+Xmr+/v0pLS2sdP2bMGG3fvl133HGHpk6dqqKiogbPAQAAALQ0\nRkN4bWHZbrcrICCg1vE+Pj7y9vbWLbfcojZt2mj9+vUNnuO/VVRUKDc3VxUVFY17EwAAAMA5MhrC\nY2JilJ2d7VLLzs5WXFycS+3FF1/U7Nmzq792Op0qLy9XUFCQYmJiamw9qW2Ouhw6dEjJyck6dOhQ\nI98FAAAAcG6MhvDExEQ5HA6lp6eroqJCCxcuVEFBgZKSklzG9evXT2+//bb27Nmj8vJyvfTSSwoK\nClJ8fLwSExNVXl5+1jkAAACAlspoCLfZbJo7d66WLFmiIUOGaP78+Xr55Zfl5+enGTNmKC0tTZI0\nYsQI/f73v9edd96ppKQk7dixQ3//+99ls9nqnQMAAAA4H1icTqfT002YlJubq+TkZK1cuVKRkZGe\nbgcAAAAXIB5bDwAAABhGCAcAAAAMI4QDAAAAhhHCAQAAAMMI4QAAAIBhhHAAAADAMEI4AAAAYBgh\nHAAAADCMEA4AAAAYRggHAAAADCOEAwAAAIYRwgEAAADDCOEAAACAYYRwAAAAwDBCOAAAAGAYIRwA\nAAAwjBAOAAAAGEYIBwAAAAwjhAMAAACGEcIBAAAAwwjhAAAAgGGEcAAAAMAwQjgAAABgGCEcAAAA\nMIwQDgAAABhGCAcAAAAMI4QDAAAAhhHCAQAAAMMI4QAAAIBhhHAAAADAMEI4AAAAYBghHAAAADCM\nEA4AAAAYRggHAAAADCOEAwAAAIYRwgEAAADDCOEAAACAYYRwoAWxl1UoJ69IxfZyT7cCAACakben\nGwBwWvryXfq/1ftkL6uUzdtLo4d219Rre8lq5XdlAABaG/52B1qAD77I1tuf7Ja9rFKS5Kio0pLP\ns/S/n+z2cGcAAKA5EMKBFuCDL7JrrS/7MkdOp9NwNwAAoLkRwoEWoKCotNZ60SmHKioJ4QAAtDaE\ncKAFuCQ6tNZ6XNcQ+XjzbQoAQGvD3+5ACzB5dE/52awuNW+rRb+46mIPdQQAAJoTp6MALUBcZIie\nu/cyvffZPuXkFalLeKDGjYhVXNcQT7cGAACaASEcaCG6RgTpt5PiPd0GAAAwgO0oAAAAgGGEcAAA\nAMAwQjgAAABgGCEcAAAAMIwQDgAAABhGCAcAAAAMI4QDAAAAhhHCAQAAAMMI4QAAAIBhhHAAAADA\nMEI4AAAAYBghHAAAADCMEA4AAAAYRggHAAAADCOEAwAAAIYRwgEAAADDCOEAAACAYYRwAAAAwDBC\nOAAAAGCY8RCemZmpiRMnKj4+XikpKdq6dWut4xYsWKDRo0dr4MCBmjhxojZu3Fh97fXXX1fv3r2V\nkJCg+Ph4JSQkaNOmTabeAgAAAHBOjIZwh8Oh1NRUTZgwQRs3btSUKVOUmpoqu93uMm7dunWaPXu2\nXnjhBW3cuFE33XSTUlNTdeLECUmng/y0adO0efNmffXVV9q8ebMGDBhg8q0AAAAAjWY0hGdkZMhq\ntWrSpEmyWq0aP368wsLCtGrVKpdxhw4d0q233qoePXpIkq677jp5eXlp7969kqSdO3dWXwMAAADO\nN94mb5aVlaXY2FiXWnR0tLKyslxq48aNc/l606ZNKikp0U9+8hOVlpYqOztb8+bN0/3336/g4GDd\ncsstGj9+fLP3DwAAADQFoyHcbrfL39/fpebv76/S0tI6X7Nv3z7dc889uueeexQcHKzc3FwNGDBA\nkydP1tChQ7VlyxalpqaqQ4cOGj58eHO/BQAAAOCcGd2OUlvgttvtCggIqHX8mjVrNHnyZN188826\n9dZbJUmRkZF66623NHz4cHl7e2vgwIEaN26cVqxY0ez9AwAAAE3BaAiPiYlRdna2Sy07O1txcXE1\nxi5atEj33nuv0tLSdPvtt1fXd+zYoddee81lbFlZmXx9fZunaVxwKquc+np/vr7afUSO8kpPtwMA\nAFoho9tREhMT5XA4lJ6erkmTJmnx4sUqKChQUlKSy7i1a9fq8ccf1+uvv17j1JPAwEDNmTNH3bt3\n15VXXqmMjAx9+OGHSk9PN/lW0Ert+bZQs+Zt0JHC0yf2BAX46H8m9tewvp093BkAAGhNLE6n02ny\nhnv27NEf//hH7d27V1FRUUpLS1Pfvn01Y8YMWSwWpaWlaerUqcrIyKhe3XY6nbJYLHrhhReUlJSk\nVatW6dlnn9WBAwfUqVMn3Xfffbriiivcun9ubq6Sk5O1cuVKRUZGNudbxXnGUV6pqX/6RMeLy1zq\n3laLXn4wWR3D2nioMwAA0NoYD+GeRghHXb7Y+p2enreh1muTR/fUz3/aMo/FLDxZqrc+3Km12/Nk\ntVo0vH8XTfnZxWrj7+Pp1gAAQB2MbkcBWrJie3md107Vc82TyisqNf1vX+jg0eLq2tI12dqfe0Kz\n7kqSxWLxYHcAAKAuxh9bD7RU/S8Kl1cdmTW+R7jZZtz0xdbvXAL4GTtzCrRtX74HOgIAAO4ghAPf\niwgN0ITki2rUh/bppIQeHTzQ0dl9c+hk3dfyigx2AgAAGoLtKMCP3HzVxeoVE6ZVm3NVUVGlIb07\nali/Li12W0eX8MA6r0V2CDLYCQAAaAhCOPBfEnp0aLEr3/9teHwX/e8nu3WkoMSlHhsZrP4Xtcwt\nNAAAgO0owHnN18eqp+4cpmH9OsvbapHNx6rkQV312G+GyquuDe4AAMDjWAkHznMd2gXooV8M0pnT\nRlvq1hkAAPADQjjQShC+AQA4f7AdBQAAADCMEA4AAAAYRggHAAAADCOEAwAAAIYRwgEAAADDCOEA\nAACAYYRwAAAAwDDOCQfOoryiUl9uy9OBIycVFdFWiX06yceb318BAEDjEcKBehw7YdfDL3+hg0dP\nVde6RgTqidRhahfk58HOAADA+YzlPKAebyzNdAngknTgcLHe+nCnhzoCAACtASEcqMeX2/Nqr2/7\nznAnAACgNSGEA/XwstRet1jquAAAAOAGQjhQj2H9OtdaH96/i+FOAABAa0IIB+rx6zG9FN25rUst\nLjJYN199sYc6AgAArQGnowD1CA701ezfjdSmXYeVe/ikunVsq4QeHeRV1z4VAAAANxDCgbOwelk0\n+JKOGnxJR0+3AgAAWgm2owAAAACGEcIBAAAAwwjhAAAAgGGEcAAAAMAwPpgJnCe+yy/Wjv3HFBzk\nqwE9Oshq5XdoAADOV4Rw4Dzw6nvb9MEX2XI6T3/dITRAabcmqmtEkGcbAwAAjcJSGtDCrf4qV0vX\n/BDAJelIQYmenb/Jc00BAIBzQggHWrjPNufWWt+fe0IHDp803A0AAGgKhHCghSsvr6rzmqO80mAn\nAACgqRDCgRZucK/an9TZITRA0Z2DDXcDAACaAiEcaOFGJ0apT2x7l5rNx6r/mdBPXl4WD3UFAADO\nBaejAC2czceqmXdcqnVf52n7/nyFBPkqeWA3tQ/x93RrAACgkQjhwHnA6mXRpX0769K+nT3dCgAA\naAJsRwEAAAAMI4QDAAAAhhHCAQAAAMMI4QAAAIBhhHAAAADAMEI4AAAAYBghHAAAADCMEA4AAAAY\nRggHAAAADCOEAwAAAIYRwgEAAADDCOEAAACAYYRwAAAAwDBCOAAAAGAYIRwAAAAwjBAOAAAAGEYI\nBwAAAAwjhAMAAACGEcIBAAAAwwjhAAAAgGGEcAAAAMAwQjgAAABgmLenGwBai2/yirRg5R7t/fa4\n2of469rhMRrap5On2wIAAC0QIRxoAt8cKtL9L66WvaxSkpR37JS278/XXRP7aXRid882BwAAWhy2\nowBNYOF/9lYH8B+b/9FuVVZWeaAjAADQkhHCgSaw99vjtdYLikp17ESp4W4AAEBLx3YUoAlEhAbo\n4NHiGnU/m1VtA23nPP/RQrsWr96n3TmFCg3205ikaPWNCz/neQEAgGcYXwnPzMzUxIkTFR8fr5SU\nFG3durXWcQsWLNDo0aM1cOBATZw4URs3bmzwHIAp1w6PqbU+OrG7/Gzn9rvukYIS3ffXVXp/dZZ2\nf1uotdvz9MgrX+rTTQfOaV4AAOA5RkO4w+FQamqqJkyYoI0bN2rKlClKTU2V3W53Gbdu3TrNnj1b\nL7zwgjZu3KibbrpJqampOnHihNtzACYNvDhCv/t5gjq085ck+ft667rLYvWrMZec89yLPt2r48Vl\nLjWnU5r3QaYqq5znPD8AADDPaAjPyMiQ1WrVpEmTZLVaNX78eIWFhWnVqlUu4w4dOqRbb71VPXr0\nkCRdd9118vLy0t69e92eAzDt8oFdNfcPV+rNGaP1r8d+pqlje8vbeu7fYpnZBbXW80+U6nDBqbO+\n/mSJQ59vOah1X+epvKLmh0cBAIB5RveEZ2VlKTY21qUWHR2trKwsl9q4ceNcvt60aZNKSkoUFxen\nxYsXuzUH4AleXhaFtvVr0jlDg/2Uk1dUo+5ttahtQP37zZevzdHcxdvlqDh9QktwoE3TfzlYvWLC\nmrRHAADQMEZXwu12u/z9/V1q/v7+Ki2t+/SIffv26Z577tE999yjkJCQRs0BnA+OnyzTv1fu0fNv\nf6X/W71fxfZySdLVQ7vXOn5EfKQC6wnh3xwq0pxFW6sDuCSdKHboyTfWy1HOijgAAJ5kdCW8trBs\nt9sVEBBQ6/g1a9bovvvu09SpU3Xrrbc2ag7gfJCTV6Q/zPlCJ0sc1bV3Ptmtv9wzQkN6d9IdKX2U\n/tFunSxxyOpl0fD+XZR6fd965/xsU66ctWwZLzrl0KZdhzW0T+emfhsAAMBNRkN4TEyM0tPTXWrZ\n2dkaO3ZsjbGLFi3SU089pccff1xXX311o+YAzhf/eP9rlwAuSSdLyvXQS2v0j0d+qmuSYnTlkCgd\nPFqsdkF+CgnyPeucpY6KOq/V9mAhAABgjtHtKImJiXI4HEpPT1dFRYUWLlyogoICJSUluYxbu3at\nHn/8cb366qsuAbwhcwDni/KKKm3Zc7TWa4Uny7Rq8+mjCG0+VkV3DnYrgEvSoIs71lr3tnopvgdn\njAMA4ElGQ7jNZtPcuXO1ZMkSDRkyRPPnz9fLL78sPz8/zZgxQ2lpaZKkv//976qoqNBvfvMbJSQk\nKD4+XgkJCVqzZk29cwDnIy8vS72nqGzcdaRR88b3CNfIhMga9V+NuUTtgvh+AQDAkyxOZ227Rluv\n3NxcJScna+XKlYqMrBlQAE94/O8Z2rDzcK3XrhzcTb+dFN+oeZ1OpzbtOqJ1Ow7J18eqkQmRiusa\nci6tAgCAJsBj64EW4L7JCfrV4x+rrJZTS5IHdWv0vBaLRQMvjtDAiyPOpT0AANDEjD+2HkBNgQE2\nPXfvCIWH/HD8ps3Hqqlje3GmNwAArRAr4UADlZZVyNdmlcViadJ5u3Vsq78/fKV2ZB9TcUm5eseG\nKegsD+MBAADnJ0I44KbVX+Vq/ke7dPDoKbUL8tW4EbG6flRck4ZxLy+L+sS2b7L5AABAy0QIB9yw\nPvOQ/vyvTdVfF54s0xsfZKrK6dTE5Is82BkAADgfsScccMO7n+6rtf7+6ixVVl1QBwwBAIAmwEo4\n4Ia8/FO11o8Xl8leWq5AD+3drqysUsaOQ8o6eEKdwgKU1L+L/Gx8WwMA0NLxtzXghujObVVQVFqj\n3qGdvwL8fDzQkVRc4tDDr3yprIMnqmvpy3fpiTuHqXP7QI/0BAAA3MN2FMANN1xxkWp7qOWkK3vI\ny6tpT0lx19uf7HEJ4JKUf6JUcxd/7ZF+AACA+wjhwFlUVTn11e4jklzD9pVDuumnQ6I805SkL7d/\nV2t9867DKnVUGO4GAAA0BCEcqMeRwhKlPvMfvf3JnhofwFy5/lsdPFrsoc4kr7qORrRYmvwMcwAA\n0LQI4UA9XlywRd/VEbSrnNLnWw4a7ugHw/t3qbU+pFdH+fpYDXcDAAAagg9mAnUoLCrVlj1H6x1T\nXlHVrD2s+zpPy9bm6HhxmXrFhOn6kXEKCz79aPuUy2L1wRdZspdVurymZ/d2zdoTAAA4d4RwoA4O\nNwJ2Yu+OjZ7/2Am73v1sn77ef0whgb762dAoDe3Tufr6+6v3a+7//fAhy/25J/TF1u/03L2XKbSt\nn+Yt31kjgEvSgk/26Oqh0fLz5dsbAICWir+lge9VVjm1MfOQduYUKDTYTyMTuqp7p7bKySuqdXzK\nyDj9pGvDV50rKqt0+NgpPfLKl8o/8cOxh5t3H9Et1/ZSysg4lToqNP/j3TVee+xEqZZ8nqUxSdH6\nOOObWuc/VVqhXd8UqP9FHRrcGwAAMIMQDkgqK69U2ty1+nr/sera/OW79KsxvfT6kh2yl/1w2khI\nkK+m3TRA/X4S3qB7lFdU6o0PMvXJum9d5vuxdz7ZrauGdteBIyd1yl5e65idOQUKDrTJWc+DOtv4\n++hEcZlKSisU3s5fe789LotFuqhbO48dqQgAAH5ACAckfbAmyyWAS6dXlD/4IluvPJSslRu+1dHj\ndvWMCtXw/p3l493wDz6+vGibPln/bb1jTpVW6NvDJxUW7Ccvy+kPf/63kECb7KV1H0EYEmjTghV7\ntH7HIVU5T5+iUvV9Ym8X5KsHfzFIvWLCGtw/AABoOpyOAkhauz2v1npOXpH+9Po6hQX7adDFEdqX\ne1wLVuzVd/kNO5rwRHGZPt104KzjLBapXZCfwoL9ldinU61jNuw8LKu17tXskCBfZXx9qDrAV/1o\nybzwZJkeffVLlZTWvsoOAADMYCUckGSt7XGY39t74Lhm/+9XLrV/r9yj+yYnaER8pFvzHy20q6Ky\nnv0j30vs3Unh7U6ffnLPpHg5nTV/QXCUV+l/P96tUQMi9emmXJdrVwzqqhUb6g/75RVVWvTpXt18\n1SVu9Q4AAJoeIRyQNCK+i3ZkHTv7wO9VVjk1Z9E2De7VUX62s38bdQ5vIz+bVaWOmqeZSJLVy6Jh\nfTvrfyb2q64F+PkooUeHWlfpKyqdiu4crJEDuurLbd/Jy2LR8P5d5KioPGsIl07vKwcAAJ5DCAck\njU7srvdX79fBo6fcfs0pe7m27cvXKXu5NmQelq+PVZcP7Ko+ce1rjA3w89F1l8Xp7U9cTzwJ9PfR\nQ78cpOjOwWrbxlbjdWXltYd2SSp1VCqhRwcl9PjhFJTjJ8vkbfVSRWX9xyt2bh94trcHAACaESEc\n0OmV6DFJMXr1ve0Net2CFXu0+5vC6q9XbPhWU67qqUlX9JAk7cop0ObdR9TG30dXD+uu9iF++vDL\nHB0/WareMe114097qGtEUJ3zD7w4Qv94/+taT0IZfElEjVpIkK9SRsbq3yv31tv3uBGxbr5DAADQ\nHAjhwPdGJkTqtfe26+w7t09r28bmEsDPePvj3bpyUDe9tWyXVmz44TSUeR9kavqvBuv5+0a63VOX\n8EDdkHyR3lmxx6U+dniMYiNDan3NL66+RF0jgvTJum91uOCUjh63V4d4i0X6zbje9QZ/AADQ/CxO\nZ32nDbc+ubm5Sk5O1sqVKxUZ6d6H6nDheG7+phofdpSkoAAfFdvLq8NsuyBf9YwK1dqvaz9VZdyI\nGP3f6qwa9eBAm/756E8bfMRhZvYxrdn6naqqnLq0byf1jXP/jPKTJQ6t+zpP5ZVODb4kovqx9wAA\nwHNYCQd+5M7x/eQor9KX27+T0yl5Wy26cnA3pY7vp2MnSrVlzxG18bdp4MUdlL58V53zZB08UWv9\nRLFDX+8/pvgeDXua5SXRYbokunFnewcF2HTF4KhGvRYAADQPQjjwI36+3nrol4N0pKBERwpL1K1j\n2+oPTLYP8XcJs8mDuum9VftV9V9P1Gkf7KewkLpXm3liJQAA4GE9QC06hAaod2z7Wk8sOaNrRJB+\nd2O82vj7VNc6hgXo0amJGtG/S62vCW3ry9MqAQAAK+HAuRg5oKsS+3RSZlaBfG1WXdw9VF5eFsV0\nCda1w2O05PMf9oUH+Hnr9zcNkHc9DwYCAAAXBkI4cI78bN5K6Flzj/dt1/XR6MQofbX7iNr4+ejS\nvp1dVs0BAMCFixAONKOojm0V1bGtp9sAAAAtDP8uDgAAABhGCAcAAAAMI4QDAAAAhhHCAQAAAMMI\n4QAAAIBhhHAAAADAMEI4AAAAYBghHAAAADCMEA4AAAAYRggHAAAADCOEAwAAAIYRwgEAAADDCOEA\nAACAYYRwAAAAwDBCOAAAAGAYIRwAAAAwzPtsA+x2uxYvXqzVq1dr+/btOn78uCwWi8LCwtS7d29d\ndtlluuaaaxQQEGCiXwAAAOC8V2cIr6io0Kuvvqo333xT3bp1U1JSkq655hqFhISosrJShYWF2r17\ntxYsWKC//OUvuvnmm3XbbbfJZrOZ7B8AAAA479QZwm+44QYlJSVpyZIlioiIqHeSnJwcvf3227rh\nhhu0ePHiJm8SAAAAaE0sTqfTWduFo0ePKjw8vEGTHTlyRB06dGiSxppLbm6ukpOTtXLlSkVGRnq6\nHQAAAFyA6vxg5tkCeF5eniorK11qLT2AAwAAAC2BW6ejHD58WHfffbd27NihsrIyTZ48WaNGjdKo\nUaO0a9eu5u4RAAAAaFXcCuGPPfaYjh8/rnbt2um9997T3r179c477+iKK67QE0880dw9AgAAAK3K\nWY8olKSMjAwtXLhQnTt31ooVKzRq1Cj169dPoaGhuvbaa5u7RwAAAKBVcWsl3MfHR5WVlTp16pTW\nr1+vyy67TNLpD2JyPjgAAADQMG6thF966aX6wx/+IH9/f9lsNo0cOVKrV6/WE088UR3IAQAAALjH\nrZXwmTNnqn///goMDNScOXPUpk0bZWZmavDgwXrkkUeau0cAAACgVanznHCn0ymLxdKgyRrzGtM4\nJxwAAACeVudK+IQJE7RixQq3JqmqqtKHH36o8ePHN1ljAAAAQGtV557wv/71r5o5c6aeeOIJJScn\na9iwYYqLi1O7du3kdDpVWFioXbt2acOGDVq2bJl69uypv/71ryZ7BwAAAM5LdW5HOWPbtm1KT0/X\n6tWrVVhY6LLdJCwsTCNGjNCNN96ovn37NnuzTYHtKAAAAPC0s56O0rdv3+qAffDgQR07dkxeXl4K\nDw9XREREszcIAAAAtDZuHVF4RpcuXdSlS5fm6gUAAAC4ILh1RCEAAACApmM8hGdmZmrixImKj49X\nSkqKtm7dWu/4N954Q7/97W9daq+//rp69+6thIQExcfHKyEhQZs2bWrOtgEAAIAmYzSEOxwOpaam\nasKECdq4caOmTJmi1NRU2e32GmPtdrueeeYZzZo1q8bZ45mZmZo2bZo2b96sr776Sps3b9aAAQNM\nvQ0AAADgnDQohB8+fFgZGRkqLS1Vfn5+g2+WkZEhq9WqSZMmyWq1avz48QoLC9OqVatqjL3rrrt0\n4MAB3XjjjTWu7dy5Uz169Gjw/QEAAICWwK0QXlJSot/97ne67LLLdMstt+jo0aP64x//qMmTJ6ug\noMDtm2VlZSk2NtalFh0draysrBpjn376ab344osKCwtzqZeWlio7O1vz5s1TUlKSrrnmGi1atMjt\nHgAAAABPcyuE//nPf9ahQ4e0bNky+fr6SpKmTZum8vJyPfnkk27fzG63y9/f36Xm7++v0tLSGmPD\nw8NrnSM/P18DBgzQ5MmT9dlnn+mxxx7T008/rc8//9ztPgAAAABPciuEr1y5UtOnT1d0dHR1LSYm\nRmlpaQ0Kv7UFbrvdroCAALfniIyM1FtvvaXhw4fL29tbAwcO1Lhx47RixQq35wAAAAA8ya0QXlxc\nrMDAwBp1i8WiiooKt28WExOj7Oxsl1p2drbi4uLcnmPHjh167bXXXGplZWXVK/QAAABAS+dWCE9K\nStIrr7yiysrK6lphYaH+/Oc/a9iwYW7fLDExUQ6HQ+np6aqoqNDChQtVUFCgpKQkt+cIDAzUnDlz\n9PHHH8vpdGrt2rX68MMPdf3117s9BwAAAOBJboXwRx55RDk5ORo6dKhKS0t16623atSoUTpx4oQe\nfvhht29ms9k0d+5cLVmyREOGDNH8+fP18ssvy8/PTzNmzFBaWtpZ54iKitLzzz+vl156SQkJCZo5\nc6ZmzZqlnj17ut0HAAAA4EkWp9PpdHfw2rVrlZWVpYqKCsXGxmrYsGE1zvBu6XJzc5WcnKyVK1cq\nMjLS0+0AAADgAuTdkMGJiYkuD8UpLy+XdHqFGwAAAIB73Arh69ev12OPPaacnBxVVVVV151OpywW\ni3bu3NlsDQIAAACtjVsh/NFHH1VcXJwefPBB+fn5NXdPAAAAQKvmVgg/cuSIXnnlFZdzwgEAAAA0\njluno1x55ZVatWpVc/cCAAAAXBDcWgm/7777NHbsWC1dulRdu3aVl5drdn/22WebpTkAAACgNXIr\nhD/88MOyWCyKjIxkTzgAAABwjtwK4Rs3btS//vUv9enTp7n7AQAAAFo9t/aER0VFyeFwNHcvAAAA\nwAXBrZVTrsPIAAAgAElEQVTw1NRUPfTQQ7r55pvVrVs3eXu7viwpKalZmgMAAABaI7ceW9+zZ8+6\nJzjPHtbDY+sBAADgaW6thO/atau5+wAAAAAuGHWGcIfDIZvNVv3n+pwZBwAAAODs6gzh/fr105o1\naxQWFqa+ffvKYrHUGON0Os+77SgAAACAp9UZwt98800FBwdLkubNm2esIQAAAKC1qzOEr1+/Xn36\n9JG3t7cGDx5ssicAAACgVavznPC//e1vKikpMdkLAAAAcEGoM4S7cXIhAAAAgEao94jC8vJyt56U\nyekoAAAAgPvqDeGjRo1yaxJORwEAAADcV28If+GFF6pPSAEAAADQNOoM4RaLRQkJCQoLCzPZDwAA\nANDq8cFMAAAAwLA6Q3hKSop8fX1N9gIAAABcEOrcjvLUU0+Z7AMAAAC4YNS5Eg4AAACgeRDCAQAA\nAMMI4QAAAIBhhHAAAADAMEI4AAAAYBghHAAAADCMEA4AAAAYRggHAAAADCOEAwAAAIYRwgEAAADD\nCOEAAACAYYRwAAAAwDBCOAAAAGAYIRwAAAAwjBAOAAAAGEYIBwAAAAwjhAMAAACGEcIBAAAAwwjh\nAAAAgGGEcAAAAMAwQjgAAABgGCEcAAAAMIwQDgAAABhGCAcAAAAMI4QDAAAAhhHCAQAAAMMI4QAA\nAIBhhHAAAADAMEI4AAAAYBghHAAAADCMEA4AAAAYRggHAAAADCOEAwAAAIYRwgEAAADDCOEAAACA\nYYRwAAAAwDBCOAAAAGAYIRwAAAAwjBAOAAAAGEYIBwAAAAwzHsIzMzM1ceJExcfHKyUlRVu3bq13\n/BtvvKHf/va35zQHAAAA0JIYDeEOh0OpqamaMGGCNm7cqClTpig1NVV2u73GWLvdrmeeeUazZs2S\nxWJp1BwAAABAS2Q0hGdkZMhqtWrSpEmyWq0aP368wsLCtGrVqhpj77rrLh04cEA33nhjo+cAAAAA\nWiKjITwrK0uxsbEutejoaGVlZdUY+/TTT+vFF19UWFhYo+cAAAAAWiKjIdxut8vf39+l5u/vr9LS\n0hpjw8PDz3kOAAAAoCUyGsJrC8t2u10BAQFG5wAAAAA8yWgIj4mJUXZ2tkstOztbcXFxRucAAAAA\nPMloCE9MTJTD4VB6eroqKiq0cOFCFRQUKCkpyegcAAAAgCcZDeE2m01z587VkiVLNGTIEM2fP18v\nv/yy/Pz8NGPGDKWlpZ3THAAAAMD5wOJ0Op2ebsKk3NxcJScna+XKlYqMjPR0OwAAALgA8dh6AAAA\nwDBCOAAAAGAYIRwAAAAwjBAOAAAAGEYIBwAAAAwjhAMAAACGEcIBAAAAwwjhAAAAgGGEcAAAAMAw\nQjgAAABgGCEcAAAAMIwQDgAAABhGCAcAAAAMI4QDAAAAhhHCAQAAAMMI4QAAAIBhhHAAAADAMEI4\nAAAAYBghHAAAADCMEA4AAAAYRggHAAAADCOEAwAAAIYRwgEAAADDCOEAAACAYYRwAAAAwDBCOAAA\nAGAYIRwAAAAwjBAOAAAAGEYIBwAAAAwjhAMAAACGEcIBAAAAwwjhAAAAgGGEcAAAAMAwQjgAAABg\nGCEcAAAAMIwQDgAAABhGCAcAAAAMI4QDAAAAhhHCAQAAAMMI4QAAAIBhhHAAAADAMEI4AAAAYBgh\nHAAAADCMEA4AAAAYRggHAAAADCOEAwAAAIYRwgEAAADDCOEAAACAYYRwAAAAwDBCOAAAAGAYIRwA\nAAAwjBAOAAAAGEYIBwAAAAwjhAMAAACGEcIBAAAAwwjhAAAAgGGEcAAAAMAwQjgAAABgGCEcAAAA\nMIwQDgAAABhGCAcAAAAM8/Z0AwAAAM2poOS45m9brPUHt8jqZdWwbgM1uc91CrD5e7o1XMAI4QAA\noNVyVDiU9ulzOlR8tLr28b7VyinM1czkabJYLB7sDhcytqMAAIBW68sDm1wC+Bl7jmVpx5HdHugI\nOM14CM/MzNTEiRMVHx+vlJQUbd26tdZxS5cu1RVXXKH4+HjdcccdOnbsWPW1119/Xb1791ZCQoLi\n4+OVkJCgTZs2mXoLAADgPJFblFfntQMn6r4GNDejIdzhcCg1NVUTJkzQxo0bNWXKFKWmpsput7uM\n27Vrl9LS0jR79mytW7dO7du31/Tp06uvZ2Zmatq0adq8ebO++uorbd68WQMGDDD5VgAAwHkgsm2n\nOq91advRYCeAK6MhPCMjQ1arVZMmTZLVatX48eMVFhamVatWuYw7swrep08f2Ww2TZs2TZ9//rkK\nCgokSTt37lSPHj1Mtg4AAM5Dl3YdoA5twmrU40K7q09ETw90BJxmNIRnZWUpNjbWpRYdHa2srKx6\nx4WEhCg4OFhZWVkqLS1Vdna25s2bp6SkJF1zzTVatGiRkf4BAMD5xeZtU9rl92lYt4HysfrIz9tX\nyTFJ+sNld8kppxbv/Eh3LnlYk/99t2Z+9lftPZbt6ZZxgTB6Oordbpe/v+txQP7+/iotLXV7XH5+\nvgYMGKDJkydr6NCh2rJli1JTU9WhQwcNHz682d8DAAA4v7QPCNU9Q6fWqL/51UJ9sGdl9dfbD+/W\nnvy/6qkrH1JkcN3bWICmYHQlvK7AHRAQ4FLz8/Orc1xkZKTeeustDR8+XN7e3ho4cKDGjRunFStW\nNHv/AACgdSh2nNLH+1fXqJdVOvThnv94oCNcaIyG8JiYGGVnu/4zT3Z2tuLi4lxqsbGxLuMKCgpU\nVFSk2NhY7dixQ6+99prL+LKyMvn6+jZf4wAAoF4ny4pVWl569oEtxJHifJVXltd67UA9J6oATcVo\nCE9MTJTD4VB6eroqKiq0cOFCFRQUKCkpyWXcmDFj9PHHH2vz5s0qKyvTc889pxEjRig4OFiBgYGa\nM2eOPv74YzmdTq1du1Yffvihrr/+epNvBQAASNp3LEd/+GSWpi6+X79+7/d67su5Kior9nRbZ9Uh\nsL18rD61XqvvRBWgqRgN4TabTXPnztWSJUs0ZMgQzZ8/Xy+//LL8/Pw0Y8YMpaWlSZJ69uypmTNn\navr06Ro2bJjy8/P15JNPSpKioqL0/PPP66WXXlJCQoJmzpypWbNmqWdPPuEMAIBJBfbjmrnqee0r\nyJEkVTqrlHFgs575/GXPNuaGQFsbXRmTVKNus/romosu90BHuNBYnE6n09NNmJSbm6vk5GStXLlS\nkZGRnm4HAIDz1qIdH+qdr5fUeu1PyffrovYxhjtqmKqqKi3e9ZE+2fe5jpee0MXhP9GNfca2+L7R\nOhg9HQUAALQeR04dq/daSw+zXl5euv6Sq3T9JVd5uhVcgIw/th4AALQOsaHdaq1bZFFMHdcAnEYI\nBwAAjTIiaog6BXWoUR/efbA6B0V4oCPg/MF2FAAA0Ch+Pn567PLf673M5dqc97X8rDaN6J7IBxsB\nNxDCAQBAo4X4tdWvE27Qr3WDp1sBzitsRwEAAC1e7ok8bTu0U8VlpzzdCtAkWAkHAAAtVlHpSc1e\n+3ftOLJHkuRj9VHKxaM1odc1Hu4MODeEcAAA0GLN2fBWdQCXpPLKci34eqki23ZSYteEZr+/o8Kh\nYkeJQvzbysviJafTqa2HdmrTd9tks/poeNRgdW/Xtdn7QOtDCAcAAOesrMKh7MJvFeQbqC5tOzbJ\nnIX2E/rqu69rvfafrC+aNIRnHtmj9Qe3ytvLqmHdBinIN1ALti/R2gObVFbpUFhAO93Qa4x2HN2j\n1Tnrql+3dPdK/TJ+gq7mw6hoIEI4AAA4J5/s+1zp295TSbldktSzfazuvfRWhfqH6HDxUe0v+Eah\n/u3UMzy2QfNmHt0rp2p/sPe+ghx9uOc/GhV9qfx9/Nya7+ipY/rfbf+nTd9tr17FvqHPtZq/dbGW\n7/usetz7uz6p8dpjJYV6ecNbNepOOfWvre9pWLeBCvZr694bA8Rj6z3dDgAA57XMI3uV9ulzNeo9\n28cqMrizVu5fUx2kY9p104PD71Q7/2CXsVXOKm3Jy9S+ghy1DwjV0Mh4/fOrf+uznLVnvX+Xth01\n8/JpCvRtU++4knK7pi3/k/JLClzqse2itL/wm7Pe52zuGvIrjeg+5JznwYWDlXAAANBo/8n6otb6\nrvz92pW/36WWVfitfr/scV3UPlbJsZdqUJf+Kqtw6MnVL2nn0b3V4+ZtWVi9qn42B4sOaemelbqx\nz9h6x63OWVcjgEtqkgAuSb7etiaZBxcOQjgAAGi0orKTDRpfXF6izXnbtTlvu9r7t1N5VYVO/Ncc\n7gbwM7bmZVaH8NU567Rs76c6VlKoi8JiNL7X1eoUGK5lez5r0JwNEWRro/iOvZptfrROhHAAANBo\nvTr00JZDmY16bb69sEl6CLD5Szr9Icl5WxZW19cf3KKth3cqOqSr8ooPN8m9/luQb6B+f+lvZGMl\nHA3Ew3oAAECjdQwMN3YvH6/a1w5HRV+q8spyvbdzeY1rZRVl2pW/r9l6un3gTbqkw0XNNj9aL0I4\nAABotL0F2XVe87I0Xczw9rLqt4m36Cdh0dU1q5dV43r+VElRg3TMflwny4qb7H7uOnrqmPF7onVg\nOwoAAGi0tr6BdV7rG3Gxthza0ST3uf6SqzSka7yGdI1XVsG3KrAfV1xolEK+P2klxK+tfL19VVZR\n1uh7BNkCddLRsCAfFcJJa2gcVsIBAECjDY8aIpvVp0Y9xK+t7hh0kyLbdnJ7Lm8vb/lZfWvUbVYf\nJcckVX8dE9pNA7v0rQ7gkuTn7asrY4fXeK3FYlGv8LNvF7k4PE6/6D++zuvWWlb1Lw7/iXpH9Djr\n3EBtrGlpaWmebsKkoqIizZs3T7/85S/Vti2H6gMAcC78ffwUGxqlzKN7ZS8vlSR1CeqoaUm3q3Pb\nCI2MHqrwNqHyslhVUVmuskqHvCxe6tCmvTq0CdfxsiJJUmxolH5/6W0aGTNUO47sVrGjRJIU6h+i\n3yZOVbQbj4bv3aGHqpxVOlD0ncory9WlbUfdNvAmTep9rSwWi/JOHlFlVaWCfYPk5+2n0ooy+Xh5\na1T0UP3PkF8pLqy7yqsqtL/gG1U5qyRJF4XF6M7BN+vnfcbJUVmuY/ZCtfEJ0BWxSbp94E3ytrKp\nAI3Dw3oAAMA5q6qq0v7Cb+Tt5e1WYD7jlKNE5VUVCvnR0yadTqf2F3yjSmel4kK7y+plbVAvlVWV\nKqtwVJ+aUt84L4uXLBaLS72o9KRyjucqLKCdurTt2KB7A+7i1zcAAHDOvLy8XD406a42toAaNYvF\noriw7o3uxeplPWsAPzOuNm39gtS348WNvj/gDvaEAwAAAIYRwgEAAADDCOEAAACAYYRwAAAAwDBC\nOAAAAGAYIRwAAAAwjBAOAAAAGEYIBwAAAAwjhAMAAACGEcIBAAAAwwjhAAAAgGGEcAAAAMAwQjgA\nAABgGCEcAAAAMIwQDgAAABhGCAcAAAAMI4QDAAAAhhHCAQAAAMMI4QAAAIBhhHAAAADAMEI4AAAA\nYBghHAAAADCMEA4AAAAYRggHAAAADCOEAwAAAIYRwgEAAADDCOEAAACAYYRwAAAAwDBCOAAAAGAY\nIRwAAAAwjBAOAAAAGEYIBwAAAAwjhAMAAACGEcIBAAAAwwjhAAAAgGGEcAAAAMAwQjgAAABgGCEc\nAAAAMIwQDgAAABhGCAcAAAAMI4QDAAAAhhHCAQAAAMMI4QAAAIBhhHAAAADAMOMhPDMzUxMnTlR8\nfLxSUlK0devWWsctXbpUV1xxheLj43XHHXfo2LFjDZ4DAAAAaImMhnCHw6HU1FRNmDBBGzdu1JQp\nU5Samiq73e4ybteuXUpLS9Ps2bO1bt06tW/fXtOnT2/QHAAAAEBLZTSEZ2RkyGq1atKkSbJarRo/\nfrzCwsK0atUql3FnVsH79Okjm82madOm6fPPP1dBQYHWrl3r1hwAAABAS2U0hGdlZSk2NtalFh0d\nraysrHrHhYSEKCQkRFlZWcrOznZrDgAAAKCl8jZ5M7vdLn9/f5eav7+/SktLzzrOz89PpaWlbs9R\nl8rKSknSoUOHGto+AAAA0CAdO3aUt3fNyG00hNcVuAMCAlxqZwJ3bePcnaMuR48elSTddNNNDW0f\nAAAAaJCVK1cqMjKyRt1oCI+JiVF6erpLLTs7W2PHjnWpxcbGKjs7u/rrgoICFRUVKTY2VsXFxW7N\nUZfevXsrPT1d4eHhslqtjXwnAAAAwNl17Nix1rrREJ6YmCiHw6H09HRNmjRJixcvVkFBgZKSklzG\njRkzRjfffLPGjx+vXr166bnnntOIESMUHBzs9hx18fPz08CBA5vj7QEAAABusTidTqfJG+7Zs0d/\n/OMftXfvXkVFRSktLU19+/bVjBkzZLFYlJaWJklavny5Zs+erWPHjmngwIF68sknFRoaWu8cAAAA\nwPnAeAgHAAAALnQ8th4AAAAwjBAOAAAAGEYIBwAAAAwjhAMAAACGEcIBAECrlJub6+kWgDoZPScc\nwLl57bXXlJ2draeeeqrW67m5uXrkkUf0xhtvSJL++c9/qqioqPq60+mUxWJRr169dMUVV0iS7rjj\nDj344IOKjo5u9v4BwJSdO3fqN7/5jdasWVPnmOXLl+vrr7/WtGnTdPLkSf3jH/+QxWKpvn7mZ+bo\n0aPVs2dPFRcX6/bbb9c///lP2Ww2E28DrRghHGhFZsyYobvvvrv668GDB6tXr141xu3YsaP6z/fe\ne68eeeSRGk+iBYDzWVFRkSorK+u8XlxcrOeff14LFy6sHv+rX/1KISEhNcae+ZkZGBion/3sZ5oz\nZ47uvffe5mkcFwy2owA/8sEHH+j666/XkCFDNGTIEM2YMaP62uWXX665c+dqxIgRGjx4sGbMmKHy\n8nJJ0vTp0zVr1iyNHTtWCQkJmjp1qvLy8qpf+/HHH+vaa6/V4MGD9etf/1o5OTmSpIMHD2rQoEGa\nO3eukpKSNGzYMJdV7ry8PN1yyy1KSEjQ9ddfr+zs7Dp737Jliw4fPqwBAwZU1+p6DMCP6z179lRl\nZaUyMjIa9h8LAGqxfv16TZgwQfHx8br22mv1xRdfSJJKSkr02GOPKSkpSUlJSXrkkUdUXFwsSXrp\npZd0//3364477lB8fLzGjBlT/TpJ+uijjzRmzBglJCTohhtuqA7FBw4cUGpqqkaOHKn+/fvr5z//\nubKzs1VQUKDbbrtNhYWFSkhI0IkTJ2r0OX/+fA0dOlRt2rRp0PtLSUnRO++8U9070FiEcOB7Bw8e\n1KOPPqrHH39c69at0/z587V06VKXcLp8+XK9++67WrZsmbZv364XX3yx+tq7776rP/3pT8rIyFBE\nRET1Ksm2bdv08MMPa+bMmVq7dq1GjRql22+/vXqF5uTJkzp48KA+/fRTzZkzR/Pnz9fWrVslSffc\nc4+ioqK0bt06zZw5U59++mmd/S9atEijR492qf34n1Xrq//0pz+tXg0CgMYqKChQamqqpkyZos2b\nN+u+++7T3XffreLiYj366KPKycnR0qVLtWzZMuXn57ssdCxfvly//vWvtWHDBg0fPlx/+tOfJJ1+\nSvYDDzyg6dOna/PmzRo3bpzuvvtuOZ1OPfroo4qLi9Onn36qjIwMtWvXTq+88opCQ0M1d+5ctWvX\nTps3b1ZwcHCNXmv7memOwMBA9evXT8uWLWv8fyhAhHCgWkREhJYuXarevXvr+PHjKiwsVHBwsA4f\nPlw95p577lH79u0VFhamO++8U0uXLq2+NmHCBPXt21c2m00PPfSQtm3bpoMHD2rRokVKSUlR//79\nZbVa9Ytf/EIVFRVat25d9Wtvu+02+fj4qF+/foqJiVFOTo5yc3O1fft2TZs2TT4+PurVq5cmTpxY\nZ/8bNmxQ3759G/Xee/furQ0bNjTqtQBwxmeffaaoqChdd911slgsGjVqlN588035+Pjoo48+0v33\n36+QkBAFBQXpwQcf1LJly+RwOCRJ/fv315AhQ+Tt7a2xY8fqm2++kXR6FXzEiBEaNmyYJOmmm27S\n7Nmz5XQ6NWvWLN11110qLy9Xbm6uQkJCXH5m1+Xo0aP69ttv1adPn0a9z969e2v9+vWNei1wBnvC\nge9ZrVa98847WrRokdq0aaNLLrlEFRUVLls3unXrVv3njh07Kj8/v/rrqKio6j+3bdtWfn5+ys/P\nV15entavX6/FixdLOr0VpKKiQt99952ioqJksVjUrl276td6e3vL6XTq6NGjCggIcPmn0sjISJd7\n/tjhw4cVHh7uUnNnO4okhYeHKz8/v/pDSADQGPn5+YqIiHCp9enTR0ePHlVlZaU6d+5cXe/SpYuc\nTmd1aA4NDa2+dubnYF1z9uvXT5K0b98+/eUvf9GRI0cUFxcni8Wiqqqqs/Z56NAhBQQEKCAgoFHv\nMzw83GUhBWgMVsKB733wwQdavny53n//fX300UeaPXt2jU+//3iF5eDBg+rUqVOt1woLC1VaWqqO\nHTsqPDxcU6dO1fr167V+/Xpt2LBB77//vsaOHVtvPxERESopKXE53aS+FR6LxVLjQ0jubkepqqqS\nxWIhgAM4JxERETV+Tr322ms6deqUbDabvvvuu+r6gQMH5OXl5bIIUdecR44ccak9++yzOnbsmO6+\n+27deeed+uKLL/Tmm29q0KBBbvXp5eXlVlivS2Vlpby8iFA4N/wfBHyvuLhY3t7e8vb2lsPh0Ny5\nc3Xw4MHqD19K0t/+9jcdP35chw8f1quvvqqUlJTqawsWLNC+fftUWlqqZ555RkOHDlVERISuu+46\nLViwQJmZmZKkFStW6Oqrr67+y6iu1erOnTtr8ODBevrpp1VWVqY9e/ZowYIFdfbfqVMnHT161KXm\n7kr4kSNH1KFDh3r+6wDA2V122WU6ePCglixZoqqqKv3nP//RP/7xD7Vr107XXnutnn32WRUWFurE\niRP685//rJEjRyowMLDWuc78nLrqqqu0Zs0aZWRkyOl0Kj09XR9++KF8fX3lcDjk5+cn6fSH0995\n5x1VVFRIkmw2m8rKylx+hp/RqVMnlZaW1vhwZUN+Zv54EQZoDLajAN9LSUnR2rVrdfnllyssLExX\nXXWVJk6cqKysrOoxcXFxuv7662W323XDDTfotttuq742cOBAPfDAAzpw4IAuvfRSPfvss5KkQYMG\nafr06XrggQeUl5enzp07/3979xcS1bbAcfzbMQ1TUoRAS1Or09gfZMZAMYXAcYoM+4MMQVRQBvkQ\nRS/aqElJmQ+SCSYx5YuBWIqlIKWlEkSC1cMYiIRpghpGgVCGmM7ch2jumWvdo/d6ppx+H5iHvdZe\na6+9H2Z+LtesoaKigpiYGEZGRmbNPv/1+MqVKxQWFrJt2zbCw8OxWCxMTk5+d/zJyck4HA7S0tLc\nZS0tLbS3t886d+XKlWzZssV97HA4SE5OnucTExHxFBoait1up6SkhOLiYiIjI6mqqiIkJASbzUZZ\nWRmZmZl8+fIFs9lMfn7+D/v69l4YGxtLeXk5JSUljI6OYjAYsNvtBAcHc/78eQoKCpiZmSE+Pp68\nvDxKSkpwOp0YDAbWr19PUlISTU1NREVFufsOCwvjzz//xOFwuNeaw9dZ+2+hHv69T7jJZPIYW09P\nD3v37l2oxya/qSWuH/3ZJyIe0tLSKCsrIyEhYVadzWYjOjqanJycnzCyr3p6erDZbLS0tMy7bVZW\nFrm5uSQlJf0DIxMR+fXcuHGD4eFhLly4MK924+Pj7N69m9bW1h/O4ovMhZajiPiI+Ph4Vq1aRVdX\n17za9fT0EBgYqAAuIr+VgwcP0tXVNe/9vhsaGjhw4IACuPzfFMJF5mgxfGmxqKiI69evz6vNtWvX\n5j0TJCKy2AUFBXHmzBmqqqrm3Objx4+0t7f/1P96iu/QchQRERERES/TTLiIiIiIiJcphIuIiIiI\neJlCuIiIiIiIlymEi4iIiIh4mUK4iIgPiYuLIy4ujv7+/ll1L1++JC4ujiNHjsy5v7a2NvdPht+9\ne5fU1NQFGyvA0aNHqaysXNA+RUQWA4VwEREf4+/vz6NHj2aVt7W18ccfc3/bHx0d5dSpU0xMTCzk\n8EREBIVwERGfk5iY+MMQbjQa59yP0+lcFPvji4gsRgrhIiI+xmKx0Nvby9jYmLvs1atXTExMkJCQ\n4HHuwMAA2dnZGI1G0tPTqaioYGZmBoD09HQAMjIyuHfvnruN3W4nNTUVk8mEzWZjamrKXffkyROs\nVitGoxGLxUJdXZ3H9RobG0lPT8dkMlFcXOy+lojI70YhXETEx0RGRmIwGDxmw9va2rBYLB4z21NT\nUxw/fhyDwUBzczMlJSW0trZSXl4OQH19PS6Xi9raWjIyMgB4//49PT091NTUUFlZyf3796mvrwfg\n2bNn5OTksGvXLpqamsjJyaG0tJQHDx4A8PTpU4qKijhx4gSNjY04nU66u7u99VhERH4pCuEiIj7I\nbDbT3t7uPn748CE7duzwOKe5uZmgoCByc3NZs2YNiYmJFBQUUFNTg8vlIiwsDIDQ0FACAgIA8PPz\no7S0lLVr15KSkkJKSgq9vb0A3Lp1i7S0NI4dO0Z0dDRZWVkcOnQIu90OwO3bt9m5cydWq5XY2FjO\nnTtHRESENx6HiMgvRyFcRMQHWSwWuru7+fTpE0NDQ4yNjZGUlORxzuvXrxkYGMBkMrlfJ0+eZHp6\nmuHh4e/2GxISQnBwsPt4xYoVTE5OuvuLj4/3OD8hIYHBwUF3/ebNm911fn5+bNy4cUHuV0RksVn6\nswcgIiILz2AwsHr1ajo7O3n37h1ms3nWzigzMzNs3bqVS5cuzWofERHhsab8m/+2u8qyZctmlTmd\nTqanpwFYsmQJLpfLo37pUn0MicjvSTPhIiI+ymw209HR8d2lKADr1q1jaGiI8PBwoqKiiIqK4u3b\nt5SVlf1PO6PExsbicDg8yl68eEFMTAwAGzZs8Kh3uVz09fXN/8ZERHyAQriIiI+yWCw8fvyYwcFB\nUhs7kTIAAAEpSURBVFJSZtXv2bMHgLNnz9Lf38/z588pLCzE39+fgIAAli9fDkBfXx+fP3/+2+tl\nZ2fT2dnJzZs3GRoaoqGhgbq6OvePAx0+fJiOjg5qamp48+YNly9fZmRkZAHvWERk8VAIFxHxIX+d\nvTYajQQFBbF9+/bvLvsIDAykurqa8fFxrFYrp0+fJiUlhYsXLwJfv5C5f/9+8vLyuHPnzt9ee9Om\nTVy9epWmpiYyMzOprq4mPz8fq9XqHk95eTm1tbXs27ePDx8+uLdBFBH53Sxx/ecCPRERERER+Udp\nJlxERERExMsUwkVEREREvEwhXERERETEyxTCRURERES8TCFcRERERMTLFMJFRERERLxMIVxERERE\nxMsUwkVEREREvEwhXERERETEy/4F4gCxbpu4ClYAAAAASUVORK5CYII=\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x10940a9b0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(12, 8))\n",
"sns.stripplot(x='Method', y='Time (s)',\n",
" data=timings.stack().reset_index().rename(columns={0: 'Time (s)',\n",
" 'level_1': 'Method'}),\n",
" jitter=True, size=7)\n",
"sns.despine()\n",
"plt.savefig('concat-append.svg', transparent=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That closes out part one. We've seen that..."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
@Paul-Yuchao-Dong
Copy link

Great gist! This is really helpful to ppl who finished Wes' great book and want to catch up the further improvement on pandas. I cannot believe I am the first one to leave a message here.

However, I did come asking for help. I'm not sure how whether the post request still works. The 3rd cell gave me a trace back.
BadZipfile: File is not a zip file
Or is it a python 2/3 issue? I'm running anaconda=4.0 with python 2.7.

Actually it would be helpful if you can show how should I download the zip file manually.

Thank you again for the great post explaining the recent development.

EDIT: I think I understand the problem now. It is indeed a python 2/3 problem, I think Py2 didn't wait until the request was complete for some reason. I separated the 3rd cell and got it to run smoothly.

thanks!

@andportnoy
Copy link

Hi, Tom.

I might be wrong, but in the second cell

with open("flights.csv", 'wb') as f:

should be replaced with

with open("flights.csv.zip", 'wb') as f:

since that's what you are then unzipping in the following cell.

P.S. Great post series, and I can't wait to see the second edition of Wes's book!

@andportnoy
Copy link

As of pandas 0.18.1:
read_csv will now raise a TypeError if parse_dates is neither a boolean, list, or dictionary

@matias-pizarro
Copy link

matias-pizarro commented Sep 18, 2016

@andportnoy, replace
df = pd.read_csv(fp, parse_dates="FL_DATE").rename(columns=str.lower)
with
df = pd.read_csv(fp, parse_dates=["FL_DATE"]).rename(columns=str.lower)

@TomAugspurger, thanks for this great resource

@sbraden
Copy link

sbraden commented Jun 14, 2017

I was not able to use cells 1 through 3 to download the data. I downloaded the data manually and it appears that the format has changed a bit. "FL_DATE" is now "FlightDate" for example. Thank you for writing these "not exactly for beginners" tutorials.

@lidgen
Copy link

lidgen commented Jun 19, 2017

Hi @sbraden , You can open this link https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time in your browser, choose year to 2014, tick the items in the following list, and then click the 'Download' button on the right. You will get a zip file that should satisfy you.

FL_DATE
UNIQUE_CARRIER
AIRLINE_ID
TAIL_NUM
FL_NUM
ORIGIN_AIRPORT_ID
ORIGIN_AIRPORT_SEQ_ID
ORIGIN_CITY_MARKET_ID
ORIGIN
ORIGIN_CITY_NAME
ORIGIN_STATE_NM
DEST_AIRPORT_ID
DEST_AIRPORT_SEQ_ID
DEST_CITY_MARKET_ID
DEST
DEST_CITY_NAME
DEST_STATE_NM
CRS_DEP_TIME
DEP_TIME
DEP_DELAY
TAXI_OUT
WHEELS_OFF
WHEELS_ON
TAXI_IN
CRS_ARR_TIME
ARR_TIME
ARR_DELAY
CANCELLED
CANCELLATION_CODE
DIVERTED
DISTANCE
CARRIER_DELAY
WEATHER_DELAY
NAS_DELAY
SECURITY_DELAY
LATE_AIRCRAFT_DELAY

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment