Skip to content

Instantly share code, notes, and snippets.

@mbforr
Last active January 8, 2023 10:32
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save mbforr/708e4bba8555e89b8b5452070094e051 to your computer and use it in GitHub Desktop.
# first, we will create a CTE to filter our NYC Taxi Data
WITH
nyc AS (
SELECT
*,
# creating a geometry from our latitude and longitude columns
st_geogpoint(pickup_longitude,
pickup_latitude) AS geom
FROM
`bigquery-public-data.new_york_taxi_trips.tlc_yellow_trips_2015`
# filter dates to June 1 to June 7, 2015
WHERE
pickup_datetime BETWEEN '2015-06-01T00:00:00'
AND '2015-06-08T00:00:00' ),
# create a second CTE that we will use later to join to the geometries
spatial_join AS (
SELECT
bgs.geo_id,
# count when tips are over 20%
COUNT(CASE
WHEN safe_divide(nyc.tip_amount, nyc.fare_amount) >= 0.2 THEN 1
ELSE
NULL
END
)
FROM
nyc
# join to block groups using ST_Contains
JOIN
`bigquery-public-data.geo_census_blockgroups.us_blockgroups_national` bgs
ON
ST_Contains(bgs.blockgroup_geom,
nyc.geom)
GROUP BY
bgs.geo_id
# use having to filter where the count is greater or equal to 50
HAVING
COUNT(nyc.vendor_id) > 49)
# join the aggregated data to the geometries (points)
SELECT
spatial_join.*,
bgs.blockgroup_geom AS geom
FROM
spatial_join
JOIN
`bigquery-public-data.geo_census_blockgroups.us_blockgroups_national` bgs
USING
(geo_id)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment