-
-
Save TomAugspurger/83213b5f7b21dbb52002 to your computer and use it in GitHub Desktop.
Hi, Tom.
I might be wrong, but in the second cell
with open("flights.csv", 'wb') as f:
should be replaced with
with open("flights.csv.zip", 'wb') as f:
since that's what you are then unzipping in the following cell.
P.S. Great post series, and I can't wait to see the second edition of Wes's book!
As of pandas 0.18.1:
read_csv
will now raise a TypeError
if parse_dates
is neither a boolean, list, or dictionary
@andportnoy, replace
df = pd.read_csv(fp, parse_dates="FL_DATE").rename(columns=str.lower)
with
df = pd.read_csv(fp, parse_dates=["FL_DATE"]).rename(columns=str.lower)
@TomAugspurger, thanks for this great resource
I was not able to use cells 1 through 3 to download the data. I downloaded the data manually and it appears that the format has changed a bit. "FL_DATE" is now "FlightDate" for example. Thank you for writing these "not exactly for beginners" tutorials.
Hi @sbraden , You can open this link https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time in your browser, choose year to 2014, tick the items in the following list, and then click the 'Download' button on the right. You will get a zip file that should satisfy you.
FL_DATE
UNIQUE_CARRIER
AIRLINE_ID
TAIL_NUM
FL_NUM
ORIGIN_AIRPORT_ID
ORIGIN_AIRPORT_SEQ_ID
ORIGIN_CITY_MARKET_ID
ORIGIN
ORIGIN_CITY_NAME
ORIGIN_STATE_NM
DEST_AIRPORT_ID
DEST_AIRPORT_SEQ_ID
DEST_CITY_MARKET_ID
DEST
DEST_CITY_NAME
DEST_STATE_NM
CRS_DEP_TIME
DEP_TIME
DEP_DELAY
TAXI_OUT
WHEELS_OFF
WHEELS_ON
TAXI_IN
CRS_ARR_TIME
ARR_TIME
ARR_DELAY
CANCELLED
CANCELLATION_CODE
DIVERTED
DISTANCE
CARRIER_DELAY
WEATHER_DELAY
NAS_DELAY
SECURITY_DELAY
LATE_AIRCRAFT_DELAY
Great gist! This is really helpful to ppl who finished Wes' great book and want to catch up the further improvement on pandas. I cannot believe I am the first one to leave a message here.
However, I did come asking for help. I'm not sure how whether the post request still works. The 3rd cell gave me a trace back.
BadZipfile: File is not a zip file
Or is it a python 2/3 issue? I'm running anaconda=4.0 with python 2.7.
Actually it would be helpful if you can show how should I download the zip file manually.
Thank you again for the great post explaining the recent development.
EDIT: I think I understand the problem now. It is indeed a python 2/3 problem, I think Py2 didn't wait until the request was complete for some reason. I separated the 3rd cell and got it to run smoothly.
thanks!