Skip to content

Instantly share code, notes, and snippets.

@bakercp
Last active August 29, 2015 13:56
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bakercp/8969581 to your computer and use it in GitHub Desktop.
Save bakercp/8969581 to your computer and use it in GitHub Desktop.
divvy php interface

GRANT ALL PRIVILEGES ON divvy_2013.* TO 'divvy@localhost';

NOTE, LOAD DATA INFILE requires that the csv file be in the same directory as the database. strange. very strange.

  • Steps to pre-process Divvy Data
  • There are some instances of #N/A in the station ids - search for all and replace with -1 (about 1900 of them)
  • In the trip duration column, there are occasionally values with commas and quotes - e.g. "1,123" representing 1123.
    • These are converted to a single number in excel.
  • Remove the text name columns for the station names. we'll use the databased to join that data on the other table.
  • All dates are converted to mysql format using the format yyyy-mm-dd hh:mm:ss
  • Remember to export from excel using windows csv, rather than normal on a mac, b/c it will export \r line endings not \r\n or \n which is obnoxious.
  • As a note, these fixes reduce the file size by 30 MB. Every little character counts in an 800K row doc.

LOAD DATA INFILE 'Divvy_Trips_2013.csv' INTO TABLE Divvy_Trips_2013 FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' (trip_id,@starttime_raw,@stoptime_raw,bikeid,tripduration,from_station_id,from_station_name,to_station_id,to_station_name,usertype,gender,birthyear) SET starttime = STR_TO_DATE(@starttime_raw, '%m/%d/%Y %k:%i'), stoptime = STR_TO_DATE(@stoptime_raw, '%m/%d/%Y %k:%i')

CREATE TABLE `Divvy_Trips_2013` (
`trip_id` int(11) NOT NULL,
`starttime` datetime NOT NULL,
`stoptime` datetime NOT NULL,
`bikeid` int(11) NOT NULL,
`tripduration` varchar(255) NOT NULL,
`from_station_id` int(11) NOT NULL,
`from_station_name` varchar(255) NOT NULL,
`to_station_id` int(11) NOT NULL,
`to_station_name` varchar(255) NOT NULL,
`usertype` varchar(255) NOT NULL,
`gender` varchar(255) DEFAULT NULL,
`birthyear` int(11) DEFAULT NULL,
PRIMARY KEY (`trip_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment