Skip to content

Instantly share code, notes, and snippets.

@TomAugspurger
Created March 23, 2016 01:22
Show Gist options
  • Star 11 You must be signed in to star a gist
  • Fork 10 You must be signed in to fork a gist
  • Save TomAugspurger/83213b5f7b21dbb52002 to your computer and use it in GitHub Desktop.
Save TomAugspurger/83213b5f7b21dbb52002 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@Paul-Yuchao-Dong
Copy link

Great gist! This is really helpful to ppl who finished Wes' great book and want to catch up the further improvement on pandas. I cannot believe I am the first one to leave a message here.

However, I did come asking for help. I'm not sure how whether the post request still works. The 3rd cell gave me a trace back.
BadZipfile: File is not a zip file
Or is it a python 2/3 issue? I'm running anaconda=4.0 with python 2.7.

Actually it would be helpful if you can show how should I download the zip file manually.

Thank you again for the great post explaining the recent development.

EDIT: I think I understand the problem now. It is indeed a python 2/3 problem, I think Py2 didn't wait until the request was complete for some reason. I separated the 3rd cell and got it to run smoothly.

thanks!

@andportnoy
Copy link

Hi, Tom.

I might be wrong, but in the second cell

with open("flights.csv", 'wb') as f:

should be replaced with

with open("flights.csv.zip", 'wb') as f:

since that's what you are then unzipping in the following cell.

P.S. Great post series, and I can't wait to see the second edition of Wes's book!

@andportnoy
Copy link

As of pandas 0.18.1:
read_csv will now raise a TypeError if parse_dates is neither a boolean, list, or dictionary

@matias-pizarro
Copy link

matias-pizarro commented Sep 18, 2016

@andportnoy, replace
df = pd.read_csv(fp, parse_dates="FL_DATE").rename(columns=str.lower)
with
df = pd.read_csv(fp, parse_dates=["FL_DATE"]).rename(columns=str.lower)

@TomAugspurger, thanks for this great resource

@sbraden
Copy link

sbraden commented Jun 14, 2017

I was not able to use cells 1 through 3 to download the data. I downloaded the data manually and it appears that the format has changed a bit. "FL_DATE" is now "FlightDate" for example. Thank you for writing these "not exactly for beginners" tutorials.

@lidgen
Copy link

lidgen commented Jun 19, 2017

Hi @sbraden , You can open this link https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time in your browser, choose year to 2014, tick the items in the following list, and then click the 'Download' button on the right. You will get a zip file that should satisfy you.

FL_DATE
UNIQUE_CARRIER
AIRLINE_ID
TAIL_NUM
FL_NUM
ORIGIN_AIRPORT_ID
ORIGIN_AIRPORT_SEQ_ID
ORIGIN_CITY_MARKET_ID
ORIGIN
ORIGIN_CITY_NAME
ORIGIN_STATE_NM
DEST_AIRPORT_ID
DEST_AIRPORT_SEQ_ID
DEST_CITY_MARKET_ID
DEST
DEST_CITY_NAME
DEST_STATE_NM
CRS_DEP_TIME
DEP_TIME
DEP_DELAY
TAXI_OUT
WHEELS_OFF
WHEELS_ON
TAXI_IN
CRS_ARR_TIME
ARR_TIME
ARR_DELAY
CANCELLED
CANCELLATION_CODE
DIVERTED
DISTANCE
CARRIER_DELAY
WEATHER_DELAY
NAS_DELAY
SECURITY_DELAY
LATE_AIRCRAFT_DELAY

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment