Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Great gist! This is really helpful to ppl who finished Wes' great book and want to catch up the further improvement on pandas. I cannot believe I am the first one to leave a message here.

However, I did come asking for help. I'm not sure how whether the post request still works. The 3rd cell gave me a trace back.
BadZipfile: File is not a zip file
Or is it a python 2/3 issue? I'm running anaconda=4.0 with python 2.7.

Actually it would be helpful if you can show how should I download the zip file manually.

Thank you again for the great post explaining the recent development.

EDIT: I think I understand the problem now. It is indeed a python 2/3 problem, I think Py2 didn't wait until the request was complete for some reason. I separated the 3rd cell and got it to run smoothly.

thanks!

Hi, Tom.

I might be wrong, but in the second cell

with open("flights.csv", 'wb') as f:

should be replaced with

with open("flights.csv.zip", 'wb') as f:

since that's what you are then unzipping in the following cell.

P.S. Great post series, and I can't wait to see the second edition of Wes's book!

As of pandas 0.18.1:
read_csv will now raise a TypeError if parse_dates is neither a boolean, list, or dictionary

rebost commented Sep 18, 2016 edited

@andportnoy, replace
df = pd.read_csv(fp, parse_dates="FL_DATE").rename(columns=str.lower)
with
df = pd.read_csv(fp, parse_dates=["FL_DATE"]).rename(columns=str.lower)

@TomAugspurger, thanks for this great resource

sbraden commented Jun 14, 2017

I was not able to use cells 1 through 3 to download the data. I downloaded the data manually and it appears that the format has changed a bit. "FL_DATE" is now "FlightDate" for example. Thank you for writing these "not exactly for beginners" tutorials.

lidgen commented Jun 19, 2017 edited

Hi @sbraden , You can open this link https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time in your browser, choose year to 2014, tick the items in the following list, and then click the 'Download' button on the right. You will get a zip file that should satisfy you.

FL_DATE
UNIQUE_CARRIER
AIRLINE_ID
TAIL_NUM
FL_NUM
ORIGIN_AIRPORT_ID
ORIGIN_AIRPORT_SEQ_ID
ORIGIN_CITY_MARKET_ID
ORIGIN
ORIGIN_CITY_NAME
ORIGIN_STATE_NM
DEST_AIRPORT_ID
DEST_AIRPORT_SEQ_ID
DEST_CITY_MARKET_ID
DEST
DEST_CITY_NAME
DEST_STATE_NM
CRS_DEP_TIME
DEP_TIME
DEP_DELAY
TAXI_OUT
WHEELS_OFF
WHEELS_ON
TAXI_IN
CRS_ARR_TIME
ARR_TIME
ARR_DELAY
CANCELLED
CANCELLATION_CODE
DIVERTED
DISTANCE
CARRIER_DELAY
WEATHER_DELAY
NAS_DELAY
SECURITY_DELAY
LATE_AIRCRAFT_DELAY

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment