Skip to content

Instantly share code, notes, and snippets.

@DoctorBud
Forked from LanGuo/.block
Last active November 2, 2019 05:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save DoctorBud/f77a59978b435d457d635c7a50468790 to your computer and use it in GitHub Desktop.
Save DoctorBud/f77a59978b435d457d635c7a50468790 to your computer and use it in GitHub Desktop.
Eugene parking visualization with Python and Smartdown
height: 800
scrolling: yes
border: no

Parking violations in the city of Eugene from 2007 to 2008

Forked from LanGuo's original Block to account for Smartdown changes.

Based on data released by the city to the 2017 Hack For A Cause event teams.

About this post

In this post, I try to demonstrate the steps taken to make sense of a set of parking violation data. These steps include: getting a sense of what the raw data look like, cleaning and transforming the data, and finally visualizing the geolocation data on a map. I used Python and Smartdown for these purposes.

  • This block is best viewed in non-iframe mode via the Open link above Open

  • Use the 'Previous', 'Next', and 'Home' buttons at the bottom of a post to navigate.

Data cleaning and getting locations geocoded (Python, pandas)

The main issue with the Location data is that it often consists of two or three street names. For example, 'KINCAID ST 11TH AVE 12TH'. This probably means the location of this parking violation is on Kincaid Street, between 11th and 12th Ave. While this format is human-readable and we could figure out what it means, it is not exact/accurate enough for the Google geocoding API to decipher. Therefore I decided in this situation ('Location' madeup of more than 3 street names), I'll just take the first two street name and get the geolocation coordinates for where these two streets intercept from Google map. Below is a Python script to transform the 'Location' column from our raw data into a more Google-readable 'address' column.

#!usr/bin/python

import pandas as pd
import re

parkingData = pd.read_csv('./parking2007_2008_raw.csv')

streetPattern = re.compile('[A-Z0-9]+(?=\sAVE|\sST)')

def locationToAddress(location):
  location = str(location)
  streetNames = re.findall(streetPattern, location)
  if len(streetNames) >= 2:
    return ' & '.join(sorted(streetNames[:2]))
  else:
    return location

addressCol = parkingData['Location'].apply(locationToAddress)

parkingData['address'] = addressCol

parkingData.to_csv('./parking2007_2008_w_address.csv')

Now we can compare the 'Location' and the new 'address' column to get an idea of what we just did:

// compare the 'Location' column to the 'address' column of select rows
const originalParkingCSV = 'https://raw.githubusercontent.com/LanGuo/parkingSmartdown/master/parking2007_2008_w_address.csv';
const myDiv = this.div;
let headerRow = '|Location|Address';
let extraLine = '\n|:---|:---|';
let rowsToShow = [10,15];

d3.csv(originalParkingCSV).then(
  function(data) {
    const dataToShow = data.slice(rowsToShow[0], rowsToShow[1]);

    let tableRows = dataToShow.map(function(row) {
      let oneRow = '\n|'+row.Location+'|'+row.address+'|'
      return oneRow;
    });

    let mdTable =
`
${headerRow}${extraLine}${tableRows.join('')}
`;
    // console.log(tableRows.join(''));
    let sdContent =
`
#### Compare the 'Location' in raw data to the transformed 'address' for geocoding:
${mdTable}
`;
    smartdown.setVariable('ComparisonData', sdContent);
  });

Raw-vs-Refined Comparison

Previous Next Back to Home

Hit up Google geocoding API and get some coordinates (Python, geopy)

Because the Google geocoding API has a certain quota per request IP, I decided to find out which addresses were the most frequently ticketed locations and get the coordinates of those addresses.

#!usr/bin/python

import pandas as pd
import re
from geopy.geocoders import GoogleV3
import pdb
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt

# -- Sort address by ticket count to find the most frequently ticketed locations -- #
amountsByLocation = parkingData[['address','Amount Due']]

statsByLocation = amountsByLocation.groupby('address').describe().unstack()

addressByTicketCounts = statsByLocation['Amount Due'].reset_index().sort('count', ascending=False)

addressByTicketCounts.to_csv('./address_by_ticket_counts_2007_2008.csv')

# -- Get the top N address geocoded -- #
addressByTicketCounts = pd.read_csv('./address_by_ticket_counts_2007_2008.csv')

geolocator = GoogleV3(api_key='AIzaSyDImvv3i9XUZLf8oDd6Of51_plddaJ9iC4', timeout=60)

# Get geolocation for the highest ranked 100 addresses by ticket number
numOfTopAddress = 100
topAddresses =  addressByTicketCounts.address[:numOfTopAddress]
topAddressCounts = addressByTicketCounts['count'][:numOfTopAddress]
topAddressMax = addressByTicketCounts['max'][:numOfTopAddress]

latitudeCol = np.empty(numOfTopAddress, dtype=float)
longitudeCol = np.empty(numOfTopAddress, dtype=float)

for ind,address in top100Addresses.iteritems():
    fullAddress = address + ', Eugene, OR'
    print 'Geocoding {}'.format(fullAddress)
    results = geolocator.geocode(fullAddress, exactly_one=True)
    if (results != None):
        fullAddress, (latitude, longitude) = results
        latitudeCol[ind] = latitude
        longitudeCol[ind] = longitude
    else:
        latitudeCol[ind] = np.NaN
        longitudeCol[ind] = np.NaN

outputDf = pd.DataFrame({'address':topAddresses, 'count':topAddressCounts, 'max':topAddressMax, 'latitude':latitudeCol, 'longitude':longitudeCol})

outputDf.to_csv('./top_address_geocoded_2007_2008.csv')

For each unique address, we now have the 'latitude' and 'longitude' information we can use to locate it on a map!

Previous Next Back to Home

Data wrangling and visualization with Python and Smartdown

Parking violation data of Eugene, OR, 2007-2008

This Notebook was written before Smartdown's reactivity mechanism was complete, so it behaves more like a traditional Jupyter Notebook, requiring that each step is executed before the next. We are working on a revised version of this notebook that utilizes Smartdown's full reactivity.

1.Raw Data 2.Clean And Transform Data 3.Geocoding Addresses 4.Map with marker 5.Map with trendline 1 6.Map with trendline 2

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="user-scalable=no, width=device-width, initial-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black">
<base href="https://smartdown.github.io/smartdown/lib/">
<title>Eugene Parking Lot Violation Exploration</title>
<link rel=stylesheet href="fonts.css">
<link rel=stylesheet href="smartdown.css">
<script src="smartdown.js"></script>
</head>
<body
style="margin:0;padding:0;background:white;">
<div
class="container-fluid"
style="margin:0;padding:5px;">
<div
class="smartdown-container"
id="smartdown-output">
</div>
</div>
<script src="https://smartdown.site/lib/starter.js"></script>
<script>
window.smartdownBaseURL = 'https://smartdown.site/';
window.smartdownResourceURL = window.smartdownBaseURL + 'gallery/resources/';
// window.smartdownDefaultHome = 'YouTube';
window.smartdownStarter();
</script>
</body>
</html>

Let's make a map and put some markers on it (leaflet.js)

Here is a simple leaflet map centered on Eugene, OR. Markers indicate the top 100 most-ticketed locations. Each location marker has a pop-up when clicked on it, showing the address, total number of tickets over this period, and the maximal fined amount.

//smartdown.import=d3
const mymap = L.map(this.div.id).setView([44.0489713,-123.0944854], 12);
L.tileLayer('https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png', {
    attribution: 'Map data &copy; <a href="http://openstreetmap.org">OpenStreetMap</a> contributors, <a href="http://creativecommons.org/licenses/by-sa/2.0/">CC-BY-SA</a>, Imagery © <a href="http://mapbox.com">Mapbox</a>',
}).addTo(mymap);

// Preprocessing with Python to do geocoding, use the resulting latitude, longitude data for leaflet map.
const parkingCSV = 'https://raw.githubusercontent.com/LanGuo/parkingSmartdown/master/parking_aggregate_2007_2008_geocoded.csv';

d3.csv(parkingCSV, function(d) {
  return {
    address: d.address,
    latitude: +d.latitude,
    longitude: +d.longitude,
    count: +d.count,
    min_ticket: +d.min,
    max_ticket: +d.max
  };
}).then(
function(data) {
  // console.log(data[0]);
  data.map(function(d,i) {
    // somehow d3.csv is treating missing cells as value 0?
    if (d.latitude != 0 && d.longitude != 0) {
      const marker = L.marker([d.latitude, d.longitude]).addTo(mymap);
      let popupContent =
`
<b>${d.address}</b>
<br>
Number of tickets in 2007-2008: ${d.count}.
<br>
Maximum fine: ${d.max_ticket}
`;
      marker.bindPopup(popupContent);
    }
  })
});


return mymap;

Previous Next Back to Home

What about adding some trend lines to show the number of tickets issued in each month?

Approach 1: Loading premade figures (Python matplotlib)

//smartdown.import=d3

const mymap = L.map(this.div.id).setView([44.0489713,-123.0944854], 12);
L.tileLayer('https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png', {
    attribution: 'Map data &copy; <a href="http://openstreetmap.org">OpenStreetMap</a> contributors, <a href="http://creativecommons.org/licenses/by-sa/2.0/">CC-BY-SA</a>, Imagery © <a href="http://mapbox.com">Mapbox</a>',
}).addTo(mymap);

// Preprocessing with Python to do geocoding, use the resulting latitude, longitude data for leaflet map.
// const parkingCSV = 'https://raw.githubusercontent.com/LanGuo/parkingSmartdown/master/parking_aggregate_2007_2008_geocoded.csv';

const parkingCSV = 'https://raw.githubusercontent.com/LanGuo/parkingSmartdown/master/top_address_geocoded_2007_2008.csv';

d3.csv(parkingCSV, function(d) {
  return {
    address: d.address,
    latitude: +d.latitude,
    longitude: +d.longitude,
    count: +d.count,
    max_ticket: +d.max
  };
}).then(
function(data) {
  // console.log(data[0]);
  data.map(function(d,i) {
    // somehow d3.csv is treating missing cells as value 0?
    if (d.latitude != 0 && d.longitude != 0) {
      const marker = L.marker([d.latitude, d.longitude]).addTo(mymap);
      const imgName = encodeURI(d.address);
      const imgUrl = `https://raw.githubusercontent.com/LanGuo/parkingSmartdown/master/figures/${imgName}.png`;
      // console.log(imgUrl);
      let popupContent =
`
<img src=${imgUrl}>
</img>
<b>${d.address}</b>
<br>
Number of tickets in 2007-2008: ${d.count}.
<br>
Maximum fine: ${d.max_ticket}
`;
      marker.bindPopup(popupContent);
    }
  })
});

return mymap;

Previous Next Back to Home

Approach 2: Dynamic loading of data and creating the figure when user clicks on icon (d3.js)


The final map!

Click on an icon on the map to see ticket number over time:

//smartdown.import=d3
const eugeneLat = 44.0489713;
const eugeneLong = -123.0944854;

/* global leaflet L*/
const mymap = L.map(this.div.id).setView([eugeneLat, eugeneLong], 12);

L.tileLayer('https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png', {
    attribution: 'Map data &copy; <a href="http://openstreetmap.org">OpenStreetMap</a> contributors, <a href="http://creativecommons.org/licenses/by-sa/2.0/">CC-BY-SA</a>, Imagery © <a href="http://mapbox.com">Mapbox</a>',
}).addTo(mymap);

// Preprocessing with Python to do geocoding, use the resulting latitude, longitude data for leaflet map.
const parkingCSV = 'https://raw.githubusercontent.com/LanGuo/parkingSmartdown/master/top_addresses_parking_latlong_n_monthly_stats.csv';

/* global d3 */
function loadCsvParkingData(row) {
  const countsByMonthColNames = d3.keys(row).filter(function(key) {
                        return ((key.indexOf('count') !== -1) && (key.indexOf('count') !== 0));
               });
  const countsByMonth = countsByMonthColNames.map(function(name) {
      return +row[name];
    });

  return {
    address: row.address,
    latitude: +row.latitude,
    longitude: +row.longitude,
    totalCount: +row.count,
    maxTicket: +row.max,
    countsByMonth: countsByMonth
  };
}

function scaleData(domain, range) {
  return d3.scaleLinear().domain(domain).range(range);
}

function renderDataOnMap(data) {
  data.map(function(d, i) {
    // somehow d3.csv is treating missing cells as value 0?
    if (d.latitude != 0 && d.longitude != 0) {
      const marker = L.marker([d.latitude, d.longitude]).addTo(mymap);
      const dataToPlot = d.countsByMonth;

      const xScale = scaleData([1, dataToPlot.length], [40, 260]);
      const yScale = scaleData(d3.extent(dataToPlot), [100, 10]);

      // create a line object that represents the SVN line we're creating
      const trendline = d3.line()
                          .x((d,i) => xScale(i))
                          .y(d => yScale(d));

      // marker.showSVG = showParkingPopup(i, dataToPlot, trendline, xScale, yScale); // This doesn't work, need a function on rightside instead of a function call
      // marker.on('click', showParkingPopup(i, dataToPlot, trendline, xScale, yScale)); // This doesn't work, need a function def here.
      marker.on('click', function showParkingPopup() {
        let popupContent =
`
<svg id='mysvg${i}'></svg>
<br>
<b>${d.address}</b>
<br>
Number of tickets in 2007-2008: ${d.totalCount}.
<br>
Maximum fine: ${d.maxTicket}
`;

        marker.bindPopup(popupContent);
        marker.openPopup();

        const margin = {top: 20, right: 40, bottom: 10, left: 40};
        const width = 300 - margin.left - margin.right;
        const height = 150 - margin.top - margin.bottom;
        const svg = d3.select(`#mysvg${i}`)
                      .attr("width", width + margin.left + margin.right)
                      .attr("height", height + margin.top + margin.bottom);
        // console.log(dataToPlot);
        svg.append("path")
          .attr("class", "line")
          .attr("d", trendline(dataToPlot))
          .attr("stroke-width", "2")
          .attr("stroke", "black")
          .attr("fill", "none");

        svg.selectAll("dot")
            .data(dataToPlot)
               .enter().append("circle")
                  .attr("r", 3.5)
                  // .attr("cx", function(d,i) { return x(i); })
                  // .attr("cy", function(d) { return y(d); });
                  .attr("cx", (d,i) => xScale(i))
                  .attr("cy", d => yScale(d));
        });
    }
  });
}


d3.csv(parkingCSV,
  function(d) {
    return loadCsvParkingData(d);
  }).then(
  function(data) {
    // console.log(data[0]);
    renderDataOnMap(data);
  }
);


return mymap;

Previous Back to Home

What's the raw data like?

The original data was given to us as an Excel file made up of 3 separate sheets, each containing data from a different time period. Putting them together, we get the parking violation data from July of 2007 to June of 2008 in csv format: csv data. Each record is stored in a row of this csv table. Let's load up this csv file and look at the first few lines of this table:

// generate Markdown table using js to visualize raw data
const originalParkingCSV = 'https://raw.githubusercontent.com/LanGuo/parkingSmartdown/master/parking2007_2008_raw.csv';
const myDiv = this.div;
let headerRow = '|';
let extraLine = '\n|';
let numRowsToShow = 10;

d3.csv(originalParkingCSV).then(
  function(data) {
    for (const key of d3.keys(data[0])) {
      headerRow += key+'|';
      extraLine += ':---|';
    }
    // console.log(headerRow, extraLine);

    const dataToShow = data.slice(1,numRowsToShow+1);

    let tableRows = dataToShow.map(function(row) {
      let oneRow = '\n|'
      for (const value of d3.values(row)) {
        oneRow += value+'|';
      }
      return oneRow;
    });

    let mdTable =
`
${headerRow}${extraLine}${tableRows.join('')}
`;
    // console.log(mdTbTemplate);
    let sdContent =
`
#### Raw data in csv format:
${mdTable}
`;

    smartdown.setVariable('RawData', sdContent, 'markdown');

  });

Raw Data

To visualize this data, we will display on a map the Location of parking tickets, and ideally provide some statistics associated with the locations where parking tickets were most frequently issued.

Next Back to Home

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment