GRANT ALL PRIVILEGES ON divvy_2013.* TO 'divvy@localhost';
NOTE, LOAD DATA INFILE requires that the csv file be in the same directory as the database. strange. very strange.
- Steps to pre-process Divvy Data
- There are some instances of #N/A in the station ids - search for all and replace with -1 (about 1900 of them)
- In the trip duration column, there are occasionally values with commas and quotes - e.g. "1,123" representing 1123.
- These are converted to a single number in excel.
- Remove the text name columns for the station names. we'll use the databased to join that data on the other table.
- All dates are converted to mysql format using the format yyyy-mm-dd hh:mm:ss
- Remember to export from excel using windows csv, rather than normal on a mac, b/c it will export \r line endings not \r\n or \n which is obnoxious.
- As a note, these fixes reduce the file size by 30 MB. Every little character counts in an 800K row doc.
LOAD DATA INFILE 'Divvy_Trips_2013.csv' INTO TABLE Divvy_Trips_2013 FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' (trip_id,@starttime_raw,@stoptime_raw,bikeid,tripduration,from_station_id,from_station_name,to_station_id,to_station_name,usertype,gender,birthyear) SET starttime = STR_TO_DATE(@starttime_raw, '%m/%d/%Y %k:%i'), stoptime = STR_TO_DATE(@stoptime_raw, '%m/%d/%Y %k:%i')