This set of instructions goes with this presentation deck.
yum install postgis24_10
https://wiki.ubuntu.com/UbuntuGIS
sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable
sudo apt-get install postgis
Set up our working database for this talk:
CREATE DATABASE pgopen;
\c pgopen
CREATE EXTENSION postgis;
CREATE LANGUAGE plpythonu;
Create example table:
CREATE TABLE customers (
name TEXT,
address TEXT,
geom GEOMETRY(POINT, 4326)
);
INSERT INTO customers (name, address) VALUES
('Paul', '144 Simcoe Street, Victoria, BC, Canada'),
('Hyatt', '655 Burrard St, Vancouver, BC'),
('Parc 55', '55 Cyril Magnin St, San Francisco, CA, USA');
Create example geocoding function:
CREATE OR REPLACE FUNCTION geocode(address text)
RETURNS text AS
$$
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="plpythonu")
location = geolocator.geocode(address)
wkt = 'SRID=4326;POINT(%g %g)' % ((location.longitude, location.latitude))
return wkt
$$
LANGUAGE plpythonu VOLATILE
COST 1000;
Try it out:
SELECT ST_AsGeoJson(geocode('144 Simcoe Street, Victoria, BC'));
SELECT geocode(address), address
FROM customers
WHERE name = 'Paul';
UPDATE customers
SET geom = geocode(address);
Language abstractions over geocoding APIs:
The US Census provides a data set of all 5-digit zip code areas, this file is a point file with a point in the center of each area.
Use the CSV file URL (shortened here using bit.ly) and the PostgreSQL COPY command to load the table:
CREATE TABLE zcta (
zip TEXT,
longitude REAL,
latitude REAL,
geom GEOMETRY(Point, 4326)
);
-- Read data direct from URL using curl
COPY zcta (zip, latitude, longitude)
FROM PROGRAM 'curl -L https://bit.ly/2ghU6eZ'
WITH (
FORMAT csv,
HEADER true
);
-- Add geometry values to geometry column
UPDATE zcta
SET geom = ST_SetSRID(ST_MakePoint(longitude, latitude), 4326);
-- Index for zip code
CREATE INDEX zcta_zip_idx
ON zcta (zip);
-- Cluster to re-write the table after update
CLUSTER zcta USING zcta_zip_idx;
-- Spatial index for geometry
CREATE INDEX zcta_geom_idx
ON zcta USING GIST (geom);
The US Census provides a shape file of the 5-digit ZCTA:
Unzip the file and load using shp2pgsql
:
shp2pgsql -s 4326 \
-I -D \
cb_2017_us_zcta510_500k \
zcta_polys \
| psql postgresopen
Or load using ogr2ogr
from GDAL:
ogr2ogr \
-f "PostgreSQL" \
-nlt MULTIPOLYGON \
-nln zcta_polys \
PG:"dbname=postgis24" \
cb_2017_us_zcta510_500k.shp
There is also interesting demographic data at different geographic levels. We are loading at a ZCTA level here, but for a "real" analysis with detailed data a finer leval, like a census tract, would be appropriate.
https://factfinder.census.gov/
- Click "Advanced Search"
- Click "SHOW ME ALL"
- Click "Geographies" tab on left
- Select "5-Digit Zip Code Tabulation Area"
- Click "Add To Your Selections"
- Click "CLOSE X" in dialogue
- Search for "B19001" for household income table
- Select table and click "Download" button
- Save ZIP file you receive
- Unzip contents
- Run the acs2pgsql.py script to convert files to SQL for loading into PostgreSQL
Or, just download and unzip the pre-fetched data for the one file we are using:
python acs2pgsql.py ACS_16_5YR_B19001_with_ann.csv | psql pgopen
Then create a spatial income table by joining the ACS data to the zipcode polygons:
DROP TABLE IF EXISTS zcta_polys_100k;
CREATE TABLE zcta_polys_100k AS
SELECT
a.geo_id,
z.geom,
100.0 * (a.hd01_vd14 + a.hd01_vd15 + hd01_vd16 + hd01_vd17) / NULLIF(a.hd01_vd01, 0.0) AS over_100k_pct,
a.hd01_vd14 + a.hd01_vd15 + hd01_vd16 + hd01_vd17 AS households_over_100k,
NULLIF(a.hd01_vd01, 0.0) AS all_households
FROM acs_16_5yr_b19001_with_ann a
JOIN zcta_polys z ON (z.affgeoid10 = a.geo_id);
CREATE INDEX zcta_polys_100k_geom_x ON zcta_polys_100k
USING GIST (geom);
- https://qgis.org
- https://github.com/NYCPlanning/labs-postgis-preview
- https://github.com/lukasmartinelli/postgis-editor
-
PostGIS formats
-
Client side web maps
-
Server side renderers
- https://opendata.socrata.com/Business/All-Starbucks-Locations-in-the-US/txu4-fsic
- https://opendata.socrata.com/api/views/txu4-fsic/rows.tsv?accessType=DOWNLOAD&bom=true
Create a table suitable to hold the data
DROP TABLE IF EXISTS starbucks CASCADE;
CREATE TABLE starbucks (
Brand TEXT,
Store_Number integer,
Name TEXT,
Ownership_Type TEXT,
Facility_ID TEXT,
Features_Products TEXT,
Features_Service TEXT,
Features_Stations TEXT,
Food_Region TEXT,
Venue_Type TEXT,
Phone_Number TEXT,
Location TEXT,
Street_Address TEXT,
Street_Line_1 TEXT,
Street_Line_2 TEXT,
City TEXT,
State TEXT,
Zip TEXT,
Country TEXT,
Coordinates TEXT,
Latitude REAL,
Longitude REAL,
Insert_Date TIMESTAMP
);
Pipe the data directly from opendata.socrata.com to the COPY command:
-- Input data use mm/dd/yyyy format for dates
SET DATESTYLE = US;
-- Input data are tab delimited
COPY starbucks
FROM PROGRAM 'curl -L https://bit.ly/2OQA1K9'
WITH (
FORMAT csv,
DELIMITER E'\t',
HEADER true
);
-- Add empty geometry column
ALTER TABLE starbucks
ADD COLUMN geom geometry(point, 4326);
UPDATE starbucks
SET geom = ST_SetSRID(ST_MakePoint(longitude, latitude), 4326);
CREATE INDEX starbucks_geom_idx
ON starbucks USING GIST (geom);
- Coordinates of Pike Place Market =
POINT(-122.3421 47.6101)
Find the five nearest Starbucks locations to Pike Place Public Market:
WITH pt AS (
SELECT ST_SetSRID(ST_MakePoint(-122.3421, 47.6101), 4326) AS pt
)
SELECT name, location, ST_Distance(pt, geom)
FROM starbucks, pt
ORDER BY geom <-> pt
LIMIT 5
Comparing geometry and geography distance calculations:
WITH pts AS (
SELECT 'POINT(-122.3421 47.6101)' AS p,
'POINT(-122.3386 47.6093)' AS q
)
SELECT ST_Distance(p, q) AS d_geom,
ST_Distance(p::geography, q::geography) AS d_geog
FROM pts;
Convert the existing tables to US National Atlas Albers projection (EPSG:2163):
ALTER TABLE starbucks
ALTER COLUMN geom TYPE Geometry(Point, 2163)
USING ST_Transform(geom, 2163);
ALTER TABLE zcta
ALTER COLUMN geom TYPE Geometry(Point, 2163)
USING ST_Transform(geom, 2163);
ALTER TABLE zcta_polys
ALTER COLUMN geom TYPE Geometry(MultiPolygon, 2163)
USING ST_Transform(geom, 2163);
ALTER TABLE zcta_polys_100k
ALTER COLUMN geom TYPE Geometry(MultiPolygon, 2163)
USING ST_Transform(geom, 2163);
REINDEX TABLE starbucks;
REINDEX TABLE zcta;
REINDEX TABLE zcta_polys;
REINDEX TABLE zcta_polys_100k;
Run nearest Starbucks query using the new SRID:
WITH pt AS (
SELECT ST_Transform(ST_SetSRID(ST_MakePoint(-122.3421, 47.6101), 4326), 2163) AS pt
)
SELECT name, location, ST_Distance(pt, geom)
FROM starbucks, pt
ORDER BY geom <-> pt
LIMIT 5;
Find all the Starbucks within 1km of Pike Place Market:
WITH pt AS (
SELECT ST_Transform(ST_SetSRID(
ST_MakePoint(-122.3421, 47.6101),
4326), 2163) AS pt
)
SELECT name, location,
ST_Distance(pt, geom)
FROM starbucks, pt
WHERE ST_DWithin(geom, pt, 1000);
Find all the Starbucks within 1km of Pike Place Market using a self-join instead of a coordinate literal value:
SELECT a.name, a.location,
ST_Distance(a.geom, b.geom)
FROM starbucks a
JOIN starbucks b
ON ST_DWithin(a.geom, b.geom, 1000)
WHERE b.name = 'Pike Place';
What is the average percentage of households over $100K income in the whole USA?
SELECT 100.0 * sum(households_over_100k) / sum(all_households)
AS pct_over_100k
FROM zcta_polys_100k;
What is the average percentage of households over $100K income in zip codes that house a Starbucks?
WITH starbucks_zips AS (
SELECT DISTINCT ON (geo_id) z.*
FROM zcta_polys_100k z
JOIN starbucks s
ON ST_Intersects(s.geom, z.geom)
)
SELECT 100.0 * sum(households_over_100k) / sum(all_households)
FROM starbucks_zips
What is the average percentage of houses over $100K income in zip codes that house more than one Starbucks?
WITH starbucks_zips AS (
SELECT Count(*),
Min(households_over_100k) AS households_over_100k,
Min(all_households) AS all_households
FROM zcta_polys_100k z
JOIN starbucks s
ON ST_Intersects(s.geom, z.geom)
GROUP BY geo_id
HAVING Count(*) > 1
)
SELECT 100.0 * sum(households_over_100k) / sum(all_households)
FROM starbucks_zips
What Starbucks outlets have "suspect" zip code entries?
SELECT s.Facility_ID, s.name,
s.zip, z.zcta5ce10,
s.geom
FROM starbucks s
JOIN zcta_polys z
ON ST_Intersects(z.geom, s.geom)
WHERE z.zcta5ce10 != split_part(s.zip, '-', 1);
Create the 1-digit zip code areas!
SELECT
ST_Union(geom) AS geom,
left(zcta5ce10, 1) AS zip1
FROM zcta_polys
GROUP BY left(zcta5ce10, 1)
Add the 2-digit codes as an overlay?
SELECT
ST_Union(geom) AS geom,
left(zcta5ce10, 2) AS zip1
FROM zcta_polys
GROUP BY left(zcta5ce10, 2)