You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In this post, I try to demonstrate the steps taken to make sense of a set of parking violation data. These steps include: getting a sense of what the raw data look like, cleaning and transforming the data, and finally visualizing the geolocation data on a map. I used Python and Smartdown for these purposes.
This block is best viewed in non-iframe mode via the Open link above Open
Use the 'Previous', 'Next', and 'Home' buttons at the bottom of a post to navigate.
Data cleaning and getting locations geocoded (Python, pandas)
The main issue with the Location data is that it often consists of two or three street names. For example, 'KINCAID ST 11TH AVE 12TH'. This probably means the location of this parking violation is on Kincaid Street, between 11th and 12th Ave. While this format is human-readable and we could figure out what it means, it is not exact/accurate enough for the Google geocoding API to decipher.
Therefore I decided in this situation ('Location' madeup of more than 3 street names), I'll just take the first two street name and get the geolocation coordinates for where these two streets intercept from Google map.
Below is a Python script to transform the 'Location' column from our raw data into a more Google-readable 'address' column.
Now we can compare the 'Location' and the new 'address' column to get an idea of what we just did:
// compare the 'Location' column to the 'address' column of select rows
const originalParkingCSV = 'https://raw.githubusercontent.com/LanGuo/parkingSmartdown/master/parking2007_2008_w_address.csv';
const myDiv = this.div;
let headerRow = '|Location|Address';
let extraLine = '\n|:---|:---|';
let rowsToShow = [10,15];
d3.csv(originalParkingCSV).then(
function(data) {
const dataToShow = data.slice(rowsToShow[0], rowsToShow[1]);
let tableRows = dataToShow.map(function(row) {
let oneRow = '\n|'+row.Location+'|'+row.address+'|'
return oneRow;
});
let mdTable =
`
${headerRow}${extraLine}${tableRows.join('')}
`;
// console.log(tableRows.join(''));
let sdContent =
`
#### Compare the 'Location' in raw data to the transformed 'address' for geocoding:
${mdTable}
`;
smartdown.setVariable('ComparisonData', sdContent);
});
Hit up Google geocoding API and get some coordinates (Python, geopy)
Because the Google geocoding API has a certain quota per request IP, I decided to find out which addresses were the most frequently ticketed locations and get the coordinates of those addresses.
#!usr/bin/pythonimportpandasaspdimportrefromgeopy.geocodersimportGoogleV3importpdbimportnumpyasnpfromdatetimeimportdatetimeimportmatplotlib.pyplotasplt# -- Sort address by ticket count to find the most frequently ticketed locations -- #amountsByLocation=parkingData[['address','Amount Due']]
statsByLocation=amountsByLocation.groupby('address').describe().unstack()
addressByTicketCounts=statsByLocation['Amount Due'].reset_index().sort('count', ascending=False)
addressByTicketCounts.to_csv('./address_by_ticket_counts_2007_2008.csv')
# -- Get the top N address geocoded -- #addressByTicketCounts=pd.read_csv('./address_by_ticket_counts_2007_2008.csv')
geolocator=GoogleV3(api_key='AIzaSyDImvv3i9XUZLf8oDd6Of51_plddaJ9iC4', timeout=60)
# Get geolocation for the highest ranked 100 addresses by ticket numbernumOfTopAddress=100topAddresses=addressByTicketCounts.address[:numOfTopAddress]
topAddressCounts=addressByTicketCounts['count'][:numOfTopAddress]
topAddressMax=addressByTicketCounts['max'][:numOfTopAddress]
latitudeCol=np.empty(numOfTopAddress, dtype=float)
longitudeCol=np.empty(numOfTopAddress, dtype=float)
forind,addressintop100Addresses.iteritems():
fullAddress=address+', Eugene, OR'print'Geocoding {}'.format(fullAddress)
results=geolocator.geocode(fullAddress, exactly_one=True)
if (results!=None):
fullAddress, (latitude, longitude) =resultslatitudeCol[ind] =latitudelongitudeCol[ind] =longitudeelse:
latitudeCol[ind] =np.NaNlongitudeCol[ind] =np.NaNoutputDf=pd.DataFrame({'address':topAddresses, 'count':topAddressCounts, 'max':topAddressMax, 'latitude':latitudeCol, 'longitude':longitudeCol})
outputDf.to_csv('./top_address_geocoded_2007_2008.csv')
For each unique address, we now have the 'latitude' and 'longitude' information we can use to locate it on a map!
Data wrangling and visualization with Python and Smartdown
Parking violation data of Eugene, OR, 2007-2008
This Notebook was written before Smartdown's reactivity mechanism was complete, so it behaves more like a traditional Jupyter Notebook, requiring that each step is executed before the next. We are working on a revised version of this notebook that utilizes Smartdown's full reactivity.
1.Raw Data
2.Clean And Transform Data
3.Geocoding Addresses
4.Map with marker
5.Map with trendline 1
6.Map with trendline 2
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Let's make a map and put some markers on it (leaflet.js)
Here is a simple leaflet map centered on Eugene, OR. Markers indicate the top 100 most-ticketed locations. Each location marker has a pop-up when clicked on it, showing the address, total number of tickets over this period, and the maximal fined amount.
The original data was given to us as an Excel file made up of 3 separate sheets, each containing data from a different time period. Putting them together, we get the parking violation data from July of 2007 to June of 2008 in csv format: csv data. Each record is stored in a row of this csv table. Let's load up this csv file and look at the first few lines of this table:
// generate Markdown table using js to visualize raw data
const originalParkingCSV = 'https://raw.githubusercontent.com/LanGuo/parkingSmartdown/master/parking2007_2008_raw.csv';
const myDiv = this.div;
let headerRow = '|';
let extraLine = '\n|';
let numRowsToShow = 10;
d3.csv(originalParkingCSV).then(
function(data) {
for (const key of d3.keys(data[0])) {
headerRow += key+'|';
extraLine += ':---|';
}
// console.log(headerRow, extraLine);
const dataToShow = data.slice(1,numRowsToShow+1);
let tableRows = dataToShow.map(function(row) {
let oneRow = '\n|'
for (const value of d3.values(row)) {
oneRow += value+'|';
}
return oneRow;
});
let mdTable =
`
${headerRow}${extraLine}${tableRows.join('')}
`;
// console.log(mdTbTemplate);
let sdContent =
`
#### Raw data in csv format:
${mdTable}
`;
smartdown.setVariable('RawData', sdContent, 'markdown');
});
Raw Data
To visualize this data, we will display on a map the Location of parking tickets, and ideally provide some statistics associated with the locations where parking tickets were most frequently issued.