Skip to content

Instantly share code, notes, and snippets.

@shlomihod
Last active March 4, 2024 14:59
Show Gist options
  • Save shlomihod/ed2b9cd6bc601d4d6bf4552ae479527b to your computer and use it in GitHub Desktop.
Save shlomihod/ed2b9cd6bc601d4d6bf4552ae479527b to your computer and use it in GitHub Desktop.
notebook.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "HDWmn7Lt_487"
},
"source": [
"![banner](https://learn.responsibly.ai/assets/banner.jpg)\n",
"\n",
"# Class 5 - Privacy: NYC Taxi Data Demo\n",
"\n",
"https://learn.responsibly.ai"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "V_Ab_xKn180X"
},
"source": [
"## 1. Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "iDXfeRu5jcuh"
},
"outputs": [],
"source": [
"# https://databank.illinois.edu/datasets/IDB-9610843\n",
"!wget -q https://stash.responsibly.ai/5-privacy/nyc-tlc-tax-trip-data-2013-7.zip -O data.zip"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "I7PuUIDUkEw_"
},
"outputs": [],
"source": [
"%pip install -qqq git+https://github.com/ResponsiblyAI/railib.git"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "G1276QHCkZEh"
},
"outputs": [],
"source": [
"from railib.privacy import (read_taxi_data,\n",
" plot_hourly,\n",
" plot_heatmap,\n",
" build_duration_table_viz,\n",
" plot_closest_rides,\n",
" plot_closest_rides,\n",
" plot_grid_map)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "b3mKmmZWnC-B"
},
"source": [
"## 2. Data\n",
"\n",
"Reference: https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "vG6tDy-olEsh"
},
"outputs": [],
"source": [
"rides_df = read_taxi_data(\"data.zip\", \"old_yellow\")\n",
"\n",
"rides_df.info()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0_Oj-apA4u1S"
},
"source": [
"| # | Feature | Explanation |\n",
"|----|---------------|------------------------------------------|\n",
"| 1 | pickup_dt | Time record of passenger/s boarding taxi |\n",
"| 2 | dropoff_dt | Time record of passenger/s exiting taxi |\n",
"| 3 | n_passangers | Number of passengers in taxi |\n",
"| 4 | pickup_lng | Longitude of pickup point |\n",
"| 5 | pickup_lat | Latitude of pickup point |\n",
"| 6 | dropoff_lng | Longitude of dropoff point |\n",
"| 7 | dropoff_lat | Latitude of dropoff point |\n",
"| 8 | type_ | Type of taxi (yellow/green) |\n",
"| 9 | trip_distance | Length of trip |\n",
"| 10 | fare_amount | Fare |\n",
"| 11 | Extra | Extra charges |\n",
"| 12 | mta_tax | Taxes on trip |\n",
"| 13 | tip_amount | Tip |\n",
"| 14 | tolls_amount | Cost of trip including taxes and tolls |\n",
"| 15 | total_amount | Total cost of the trip |\n",
"| 16 | payment_type | Payment method |"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "L3_OAXrTJ6yg"
},
"outputs": [],
"source": [
"non_missing_location_mask = (rides_df[['pickup_lng', 'pickup_lat', 'dropoff_lng', 'dropoff_lat']] != 0).all(axis=1)\n",
"print('% Non Missing Locations (at least one coordinates:', 100 * non_missing_location_mask.sum() / len(rides_df))\n",
"\n",
"rides_df = rides_df[non_missing_location_mask]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "rw40UDAJWb7i"
},
"outputs": [],
"source": [
"rides_df.sample(10)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_0hXiLTH123G"
},
"source": [
"## 3. Part I: Utility\n",
"\n",
"Refenrece: https://toddwschneider.com/posts/analyzing-1-1-billion-nyc-taxi-and-uber-trips-with-a-vengeance/#taxi-maps"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CZWEfD2_2DQ2"
},
"source": [
"### 3.1. Where Do All the Cabs Go in the Late Afternoon? ([NYTimes](https://www.nytimes.com/2011/01/12/nyregion/12taxi.html))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "127zGilY_Lk5"
},
"outputs": [],
"source": [
"plot_hourly(rides_df['pickup_dt']);"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xRuFcmzOFyRj"
},
"source": [
"### 3.2. Where to place [taxi stand](https://en.wikipedia.org/wiki/Taxicab_stand)?\n",
"\n",
"![](https://upload.wikimedia.org/wikipedia/commons/9/98/Taxi_Stand_620_12th_Av_48_St_jeh.jpg)\n",
"\n",
"Source: [Wikicommons](https://commons.wikimedia.org/wiki/File:Taxi_Stand_620_12th_Av_48_St_jeh.jpg)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "-cFAKo2Shflo",
"scrolled": true
},
"outputs": [],
"source": [
"pickup_sample_df = rides_df[['pickup_dt', 'pickup_lat', 'pickup_lng']].sample(10**5)\n",
"\n",
"plot_heatmap(pickup_sample_df[['pickup_lat', 'pickup_lng']])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3KZG6D0F2k1P"
},
"source": [
"### 3.3. Taxi average travel time (minutes) from Midtown to JFK Airport"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "VyGkNJp0_49j"
},
"outputs": [],
"source": [
"duration_table, between_map = build_duration_table_viz(rides_df,\n",
" 'Midtown, New York City-Manhattan, NYC',\n",
" 'JFK Airport, Queens, NYC')\n",
"\n",
"between_map"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "C1j3880vjN-m"
},
"outputs": [],
"source": [
"duration_table"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Ce8zE4lYbDW0"
},
"source": [
"## 4. Part II: Privacy Concerns"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fOZaAXSd_49o"
},
"source": [
"### 4.1 Tool: Given an address, find rides that had a pickup or dropoff there"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "mAuzcs1sEWAU"
},
"outputs": [],
"source": [
"plot_closest_rides(rides_df.sample(10000), 'Port Authority, NYC', radius=50)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XHlB9PTK_49v"
},
"source": [
"### 4.2. Where is Bradley Cooper?\n",
"\n",
"![](http://cdn01.cdn.justjared.com/wp-content/uploads/headlines/2013/07/bradley-cooper-nyc-hotel-exit-after-wimbledon-finals.jpg)![image.png](https://cdn.justjared.com/wp-content/uploads/headlines/2013/07/bradley-cooper-nyc-hotel-exit-after-wimbledon-finals.jpg)\n",
"\n",
"http://www.justjared.com/2013/07/09/bradley-cooper-nyc-hotel-exit-after-wimbledon-finals/\n",
"\n",
"https://web.archive.org/web/20200310042615/https://research.neustar.biz/2014/09/15/riding-with-the-stars-passenger-privacy-in-the-nyc-taxicab-dataset/\n",
"\n",
"https://gawker.com/the-public-nyc-taxicab-database-that-accidentally-track-1646724546\n",
"\n",
"https://chriswhong.com/open-data/foil_nyc_taxi/"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "79KYHfsB6K8X"
},
"outputs": [],
"source": [
"photo_time_rides_df = rides_df[(rides_df['pickup_dt'] >= '2013-07-08 19:33')\n",
" & (rides_df['pickup_dt'] < '2013-07-08 19:38')]\n",
"\n",
"len(photo_time_rides_df)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "HM8cTc_d96ka"
},
"outputs": [],
"source": [
"# Greenwich Hotel\n",
"\n",
"plot_closest_rides(photo_time_rides_df, '377 Greenwich St, Manhattan, NYC', 'pickup', radius=50)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nkccgK2U_CbL"
},
"source": [
"Restaurant @ 13, Bank Street, West Village"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "nng4rD0Y_sHD"
},
"outputs": [],
"source": [
"# Brooklyn Abortion Clinic\n",
"\n",
"plot_closest_rides(rides_df, '4 Dekalb Ave, Brooklyn, NYC', radius=15)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5jKUGYP6U7tJ"
},
"source": [
"## 5. Part III: \"Pseudo Anonymization\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Generalization"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "-WfYrIWae6Hu"
},
"outputs": [],
"source": [
"plot_grid_map(0.002)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "vKWQdEQwd02H"
},
"outputs": [],
"source": [
"plot_grid_map(0.003)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "tTGEekKiaTQC"
},
"outputs": [],
"source": [
"plot_grid_map(0.01)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "LMnlxl4jezCK"
},
"outputs": [],
"source": [
"plot_grid_map(0.05)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Mojy8jBGWLMJ"
},
"outputs": [],
"source": [
"plot_grid_map(0.1)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VKXobv4wqJLn"
},
"source": [
"![](https://www.nyc.gov/assets/tlc/images/content/pages/about/taxi_zone_map_manhattan.jpg)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Generalization + Nose\n",
"\n",
"https://taxi-heatmap.open-diffix.org"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernel_info": {
"name": "python3"
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
},
"nteract": {
"version": "0.12.3"
},
"vscode": {
"interpreter": {
"hash": "55bbdba5d2159c30191d9b81156a2ec7ece345201aa1fcd9b85bbc484276dddb"
}
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment