Last active
March 4, 2024 14:59
-
-
Save shlomihod/ed2b9cd6bc601d4d6bf4552ae479527b to your computer and use it in GitHub Desktop.
notebook.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "HDWmn7Lt_487" | |
}, | |
"source": [ | |
"![banner](https://learn.responsibly.ai/assets/banner.jpg)\n", | |
"\n", | |
"# Class 5 - Privacy: NYC Taxi Data Demo\n", | |
"\n", | |
"https://learn.responsibly.ai" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "V_Ab_xKn180X" | |
}, | |
"source": [ | |
"## 1. Setup" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "iDXfeRu5jcuh" | |
}, | |
"outputs": [], | |
"source": [ | |
"# https://databank.illinois.edu/datasets/IDB-9610843\n", | |
"!wget -q https://stash.responsibly.ai/5-privacy/nyc-tlc-tax-trip-data-2013-7.zip -O data.zip" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "I7PuUIDUkEw_" | |
}, | |
"outputs": [], | |
"source": [ | |
"%pip install -qqq git+https://github.com/ResponsiblyAI/railib.git" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "G1276QHCkZEh" | |
}, | |
"outputs": [], | |
"source": [ | |
"from railib.privacy import (read_taxi_data,\n", | |
" plot_hourly,\n", | |
" plot_heatmap,\n", | |
" build_duration_table_viz,\n", | |
" plot_closest_rides,\n", | |
" plot_closest_rides,\n", | |
" plot_grid_map)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "b3mKmmZWnC-B" | |
}, | |
"source": [ | |
"## 2. Data\n", | |
"\n", | |
"Reference: https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "vG6tDy-olEsh" | |
}, | |
"outputs": [], | |
"source": [ | |
"rides_df = read_taxi_data(\"data.zip\", \"old_yellow\")\n", | |
"\n", | |
"rides_df.info()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "0_Oj-apA4u1S" | |
}, | |
"source": [ | |
"| # | Feature | Explanation |\n", | |
"|----|---------------|------------------------------------------|\n", | |
"| 1 | pickup_dt | Time record of passenger/s boarding taxi |\n", | |
"| 2 | dropoff_dt | Time record of passenger/s exiting taxi |\n", | |
"| 3 | n_passangers | Number of passengers in taxi |\n", | |
"| 4 | pickup_lng | Longitude of pickup point |\n", | |
"| 5 | pickup_lat | Latitude of pickup point |\n", | |
"| 6 | dropoff_lng | Longitude of dropoff point |\n", | |
"| 7 | dropoff_lat | Latitude of dropoff point |\n", | |
"| 8 | type_ | Type of taxi (yellow/green) |\n", | |
"| 9 | trip_distance | Length of trip |\n", | |
"| 10 | fare_amount | Fare |\n", | |
"| 11 | Extra | Extra charges |\n", | |
"| 12 | mta_tax | Taxes on trip |\n", | |
"| 13 | tip_amount | Tip |\n", | |
"| 14 | tolls_amount | Cost of trip including taxes and tolls |\n", | |
"| 15 | total_amount | Total cost of the trip |\n", | |
"| 16 | payment_type | Payment method |" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "L3_OAXrTJ6yg" | |
}, | |
"outputs": [], | |
"source": [ | |
"non_missing_location_mask = (rides_df[['pickup_lng', 'pickup_lat', 'dropoff_lng', 'dropoff_lat']] != 0).all(axis=1)\n", | |
"print('% Non Missing Locations (at least one coordinates:', 100 * non_missing_location_mask.sum() / len(rides_df))\n", | |
"\n", | |
"rides_df = rides_df[non_missing_location_mask]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "rw40UDAJWb7i" | |
}, | |
"outputs": [], | |
"source": [ | |
"rides_df.sample(10)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "_0hXiLTH123G" | |
}, | |
"source": [ | |
"## 3. Part I: Utility\n", | |
"\n", | |
"Refenrece: https://toddwschneider.com/posts/analyzing-1-1-billion-nyc-taxi-and-uber-trips-with-a-vengeance/#taxi-maps" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "CZWEfD2_2DQ2" | |
}, | |
"source": [ | |
"### 3.1. Where Do All the Cabs Go in the Late Afternoon? ([NYTimes](https://www.nytimes.com/2011/01/12/nyregion/12taxi.html))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "127zGilY_Lk5" | |
}, | |
"outputs": [], | |
"source": [ | |
"plot_hourly(rides_df['pickup_dt']);" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "xRuFcmzOFyRj" | |
}, | |
"source": [ | |
"### 3.2. Where to place [taxi stand](https://en.wikipedia.org/wiki/Taxicab_stand)?\n", | |
"\n", | |
"![](https://upload.wikimedia.org/wikipedia/commons/9/98/Taxi_Stand_620_12th_Av_48_St_jeh.jpg)\n", | |
"\n", | |
"Source: [Wikicommons](https://commons.wikimedia.org/wiki/File:Taxi_Stand_620_12th_Av_48_St_jeh.jpg)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "-cFAKo2Shflo", | |
"scrolled": true | |
}, | |
"outputs": [], | |
"source": [ | |
"pickup_sample_df = rides_df[['pickup_dt', 'pickup_lat', 'pickup_lng']].sample(10**5)\n", | |
"\n", | |
"plot_heatmap(pickup_sample_df[['pickup_lat', 'pickup_lng']])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "3KZG6D0F2k1P" | |
}, | |
"source": [ | |
"### 3.3. Taxi average travel time (minutes) from Midtown to JFK Airport" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "VyGkNJp0_49j" | |
}, | |
"outputs": [], | |
"source": [ | |
"duration_table, between_map = build_duration_table_viz(rides_df,\n", | |
" 'Midtown, New York City-Manhattan, NYC',\n", | |
" 'JFK Airport, Queens, NYC')\n", | |
"\n", | |
"between_map" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "C1j3880vjN-m" | |
}, | |
"outputs": [], | |
"source": [ | |
"duration_table" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "Ce8zE4lYbDW0" | |
}, | |
"source": [ | |
"## 4. Part II: Privacy Concerns" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "fOZaAXSd_49o" | |
}, | |
"source": [ | |
"### 4.1 Tool: Given an address, find rides that had a pickup or dropoff there" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "mAuzcs1sEWAU" | |
}, | |
"outputs": [], | |
"source": [ | |
"plot_closest_rides(rides_df.sample(10000), 'Port Authority, NYC', radius=50)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "XHlB9PTK_49v" | |
}, | |
"source": [ | |
"### 4.2. Where is Bradley Cooper?\n", | |
"\n", | |
"![](http://cdn01.cdn.justjared.com/wp-content/uploads/headlines/2013/07/bradley-cooper-nyc-hotel-exit-after-wimbledon-finals.jpg)![image.png](https://cdn.justjared.com/wp-content/uploads/headlines/2013/07/bradley-cooper-nyc-hotel-exit-after-wimbledon-finals.jpg)\n", | |
"\n", | |
"http://www.justjared.com/2013/07/09/bradley-cooper-nyc-hotel-exit-after-wimbledon-finals/\n", | |
"\n", | |
"https://web.archive.org/web/20200310042615/https://research.neustar.biz/2014/09/15/riding-with-the-stars-passenger-privacy-in-the-nyc-taxicab-dataset/\n", | |
"\n", | |
"https://gawker.com/the-public-nyc-taxicab-database-that-accidentally-track-1646724546\n", | |
"\n", | |
"https://chriswhong.com/open-data/foil_nyc_taxi/" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "79KYHfsB6K8X" | |
}, | |
"outputs": [], | |
"source": [ | |
"photo_time_rides_df = rides_df[(rides_df['pickup_dt'] >= '2013-07-08 19:33')\n", | |
" & (rides_df['pickup_dt'] < '2013-07-08 19:38')]\n", | |
"\n", | |
"len(photo_time_rides_df)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "HM8cTc_d96ka" | |
}, | |
"outputs": [], | |
"source": [ | |
"# Greenwich Hotel\n", | |
"\n", | |
"plot_closest_rides(photo_time_rides_df, '377 Greenwich St, Manhattan, NYC', 'pickup', radius=50)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "nkccgK2U_CbL" | |
}, | |
"source": [ | |
"Restaurant @ 13, Bank Street, West Village" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "nng4rD0Y_sHD" | |
}, | |
"outputs": [], | |
"source": [ | |
"# Brooklyn Abortion Clinic\n", | |
"\n", | |
"plot_closest_rides(rides_df, '4 Dekalb Ave, Brooklyn, NYC', radius=15)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "5jKUGYP6U7tJ" | |
}, | |
"source": [ | |
"## 5. Part III: \"Pseudo Anonymization\"" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Generalization" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "-WfYrIWae6Hu" | |
}, | |
"outputs": [], | |
"source": [ | |
"plot_grid_map(0.002)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "vKWQdEQwd02H" | |
}, | |
"outputs": [], | |
"source": [ | |
"plot_grid_map(0.003)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "tTGEekKiaTQC" | |
}, | |
"outputs": [], | |
"source": [ | |
"plot_grid_map(0.01)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "LMnlxl4jezCK" | |
}, | |
"outputs": [], | |
"source": [ | |
"plot_grid_map(0.05)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "Mojy8jBGWLMJ" | |
}, | |
"outputs": [], | |
"source": [ | |
"plot_grid_map(0.1)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "VKXobv4wqJLn" | |
}, | |
"source": [ | |
"![](https://www.nyc.gov/assets/tlc/images/content/pages/about/taxi_zone_map_manhattan.jpg)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Generalization + Nose\n", | |
"\n", | |
"https://taxi-heatmap.open-diffix.org" | |
] | |
} | |
], | |
"metadata": { | |
"colab": { | |
"provenance": [] | |
}, | |
"kernel_info": { | |
"name": "python3" | |
}, | |
"kernelspec": { | |
"display_name": "Python 3 (ipykernel)", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.11.3" | |
}, | |
"nteract": { | |
"version": "0.12.3" | |
}, | |
"vscode": { | |
"interpreter": { | |
"hash": "55bbdba5d2159c30191d9b81156a2ec7ece345201aa1fcd9b85bbc484276dddb" | |
} | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 1 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment