Skip to content

Instantly share code, notes, and snippets.

@chriswhong
chriswhong / docker-cartodb.txt
Created June 21, 2016 14:55
Steps to get docker-cartodb working with a real domain
#Notes for getting docker-cartodb to run on a digitalocean droplet
As of 21 June 2016, the Dockerfile at sverhoeven/docker-cartodb is not up to date, and the build will fail. It seems to fail at step 39, when it goes to create a user, but was not able to update it to get it working. Hopefully someone else can get it going with the latest cartodb code.
`https://hub.docker.com/r/sverhoeven/cartodb/`
However, running `docker run -d -p 3000:3000 -p 8080:8080 -p 8181:8181 sverhoeven/cartodb` will pull a complete docker image that is a few months old.
Running this image will get you a container that expects to run at the domain `cartodb.localhost`, and per the installation instructions you are told to update your hosts file to point cartodb.localhost to the IP of your docker host.
I wanted to run this with a real domain, so here are some notes on the steps involved.
- run the image using `docker run -d -p 3000:3000 -p 8080:8080 -p 8181:8181 sverhoeven/cartodb`
@chriswhong
chriswhong / readme.md
Last active September 8, 2019 12:03
Run open trip planner docker container for NYC

This set of commands is for setting up an open trip planner instance for New York City. OTP requires GTFS data and OSM streets data to build a graph, which it uses for trip planning.

Lucky for us, someone here dockerhub has left a nice CLI command to build the graph and run the container, but we need to get the data first.

The data are downloaded on the host machine. For me, this is a digitalocean droplet running ubuntu 14.

First, get GTFS for the New York City Subway from the MTA's downloads page: wget http://web.mta.info/developers/data/nyct/subway/google_transit.zip Next, get OSM city extract for NYC. Thanks Mapzen! https://s3.amazonaws.com/metro-extracts.mapzen.com/new-york_new-york.osm.pbf

Finally, run the following docker command: docker run -it -v $(pwd):/var/otp/graphs opentripplanner/opentripplanner --build /var/otp/graphs --analyst

@chriswhong
chriswhong / idea.md
Created July 1, 2016 20:08
Idea for git-powered distributed dataset management

The Problem:

If you follow the open data scene, you'll often hear about how the "feedback loop" for making corrections, comments, or asking questions about datasets is either fuzzy, disjointed, or nonexistent. If I know for a fact that something in a government dataset is wrong, how do I get that record fixed? Do I call 311? Will the operator even know what I am talking about if I say I want to make a correction to a single record in a public dataset? There's DAT. There's storing your data as a CSV in github. These approaches work, but are very much developer-centric. (pull requests and diffs are hard to wrap your head around if you spend your day analyzing data in excel or desktop GIS. The fact of the matter is that most of the people managing datasets in government organizations are not DBAs, data scientists, or programmers.

Idea:

It's basically git for data plus a simple UI for exploration, management, and editing. Users would have to use Github SSO to edit in the UI, and behind the scenes

@chriswhong
chriswhong / gist:ee6a44f8c0b7c3706d4161a408c8ac4f
Last active March 9, 2021 19:46
Using mapshaper.js to convert geojson in memory into shapefile download

include zip.js and mapshaper.js

<script src="js/zip.js"></script>
<script src="js/mapshaper.js"></script>

serve deflate.js, inflate.js, and z-worker.js somewhere, and reference them in your code with zip.workerScriptsPath = '{path}', make sure the path ends with a slash!


  //create a mapshaper dataset from geojson FeatureCollection

#Using NYC geospport linux shared library from Ubuntu 16.04

I have been trying to understand more about geosupport, specifically geosupport desktop edition for linux which contains a linux .so shared library. I would like to eventually write node.js bindings for it so that I can write geocoding scripts that don't require a ton of network traffic.

I am a C noob and this was my first time messing with C and gcc on linux. I was able to write and compile a simple C program that calls the Geosupport shared library with hard-coded arguments.

##What is geosupport?

"Geosupport is a data processing system originally designed to run on IBM mainframes to support geographic processing needs common to New York City agencies." Basically, it's an NYC-specific geocoder released by the NYC department of city planning. It does many things, but at its simplest it can take human-readable address fields and return a point coordinate.

@chriswhong
chriswhong / readme.md
Created October 8, 2016 04:15
CartoDB in Docker Container https issues

We were hosting a site on github pages that needed to use cartodb.js and maps from our self-hosted cartodb server. Because github pages serves over https, there were mixed content errors when doing api calls to cartodb over http.

Adding ssl certificates to the nginx service on the docker host machine worked fine, but because the https traffic hitting nginx is just proxied to the cartodb services that are still running on http internally, the cartodb server is embedding http urls in some of its resources. This will lead to mixed content errors when using the cartodb app.

2 things had to be done to fix it:

  1. Edit cartodb/app/models/user/user_decorator.rb line 100, replace base_url: public_url, with base_url: public_url.sub('http','https'),

This will update the user_data.base_urlglobal which cdb.js uses to build many of the api calls.

<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>carto-mapboxgl-demo</title>
<link href='https://api.mapbox.com/mapbox-gl-js/v0.26.0/mapbox-gl.css' rel='stylesheet' />
<style>
body {
@chriswhong
chriswhong / data.json
Created December 7, 2016 19:02
Finding Data in Tableau Public Network Traffic
[
{
"sheetName": "FAR Part & Type of Contract",
"layoutId": "3447568417489532966",
"allowSubscriptions": true,
"allowSubscribeOnDataPresent": true,
"worldUpdate": {
"hyc": {
"ABX": "render-mode-client",
"res": {
@chriswhong
chriswhong / pluto-carto.md
Last active June 15, 2018 23:22
Loading PLUTO into Carto

To load MapPLUTO into carto, the best approach is to upload the five borough shapefiles together, then UNION ALL them together.

  • Upload all five zipped borough shapefiles from Bytes of the Big Apple. Be sure to uncheck 'Allow Carto to guess column types" when uploading, or you'll get column type mismatches
  • UNION ALL the tables together with the following query. We can't just SELECT * because we'd have duplicate cartodb_ids in the result set, and saving as a new table would fail.
SELECT the_geom,the_geom_webmercator,borough,block,lot,cd,ct2010,cb2010,schooldist,council,zipcode,firecomp,policeprct,healthcent,healtharea,sanitboro,sanitdistr,sanitsub,address,zonedist1,zonedist2,zonedist3,zonedist4,overlay1,overlay2,spdist1,spdist2,spdist3,ltdheight,splitzone,bldgclass,landuse,easements,ownertype,ownername,lotarea,bldgarea,comarea,resarea,officearea,retailarea,garagearea,strgearea,factryarea,otherarea,areasource,numbldgs,numfloors,unitsres,unitstotal,lotfront,lotdepth,bldgfront,bldgdepth,ext,pro
@chriswhong
chriswhong / chunk.sh
Last active March 10, 2017 18:06
Chunk a csv into many files
#!/bin/bash
FILENAME=cpdb_spending.csv
HDR=$(head -1 $FILENAME) # Pick up CSV header line to apply to each file
split -l 200000 $FILENAME xyz # Split the file into chunks of 20 lines each
n=1
for f in xyz* # Go through all newly created chunks
do
if [n -gt 1]
then
echo $HDR > Part${n}.csv # Write out header to new file called "Part(n)"