Skip to content

Instantly share code, notes, and snippets.

View gordonje's full-sized avatar

James Gordon gordonje

View GitHub Profile
@gordonje
gordonje / load.sql
Created October 1, 2024 20:10
Loading and Transforming Voter Registration Data from BLOCKS (in DuckDB)
CREATE OR REPLACE TABLE
batch
AS
FROM
read_csv_auto(
'extracts/blocks/voter_registration_scan_batches.csv',
normalize_names=TRUE
)
;
@gordonje
gordonje / README.md
Created August 6, 2024 20:06
U.S. Census Bureau Batch Geocoder

U.S. Census Bureau Batch Geocoder

Demonstrates how to make a batch geocoding service request to the U.S. Census Bureau (more info in their API docs).

Before you try this...

You need a stable connection to the internet.

The curl command-line tool must be installed and available on your PATH (more info here).

@gordonje
gordonje / README.md
Created December 18, 2023 16:27
Get VAN Survey Question Response in JSONL format

VAN Survey Question Responses

This script will:

  • Send an authenticated request to the /surveyQuestions endpoint of VAN's API
  • Unnest responses from each item
  • Output to a local file in newline delimited JSON (aka JSONL, which is...not?...the same as NDJSON...)

Dependencies

@gordonje
gordonje / example.md
Created February 21, 2022 22:46
iAWriter Smart Table Calculation Example
Editor Input Preview Output
1 1
=(2 + 2) =(2 + 2)
=(51 / 3) =(51 / 3)
=(B1 + B3) =(B1 + B3)
=(TOTAL) =(TOTAL)
import requests
from bs4 import BeautifulSoup
import json
url_stub = "https://results.mo.gov"
workbook_url = f"{url_stub}/t/COVID19/views/VaccinationsDashboard/Vaccinations"
workbook_params = {
':embed': 'y',
@gordonje
gordonje / scrape_in_parallel.py
Last active February 19, 2020 20:56
A scraping script that runs in multiple, parallel processes
import requests
from time import sleep
from multiprocessing import Pool
session = None
def set_global_session():
global session
if not session:
session = requests.Session()
@gordonje
gordonje / scrape.py
Last active February 19, 2020 20:48
A scraping script that runs as a single, synchronous process.
import requests
from time import sleep
session = requests.Session()
def cache_page(identifier):
sleep(3)
url = f'https://mycourts.in.gov/PORP/Search/Detail?ID={identifier}'
r = session.get(url)
@gordonje
gordonje / warrenmayer-youtube-dl-best-merge.sh
Last active October 23, 2019 16:47
Command, options and arguments for downloading "best" video and audio file formats from warrenmayer channel on YouTube (and merging if necessary)
youtube-dl --write-info-json --all-subs --write-all-thumbnails \
-o '~/Desktop/warrenmayer-best-merge/%(title)s-%(id)s/%(title)s-%(id)s.%(ext)s' \
https://www.youtube.com/user/warrenmayer/
@gordonje
gordonje / warrenmayer-youtube-dl-best-mp4.sh
Last active October 23, 2019 16:46
Command, options and arguments for downloading "best" compatible file formats from warrenmayer channel on YouTube (but only in mp4 container)
youtube-dl --write-info-json --all-subs --write-all-thumbnails \
-f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]' \
-o '~/Desktop/warrenmayer-best-mp4/%(title)s-%(id)s/%(title)s-%(id)s.%(ext)s' \
https://www.youtube.com/user/warrenmayer/
@gordonje
gordonje / warrenmayer-youtube-dl-best-single.sh
Last active October 23, 2019 18:14
Command, options and arguments for downloading "best" quality media, served as a single file, from warrenmayer channel on YouTube
youtube-dl --write-info-json --all-subs --write-all-thumbnails -f best \
-o '~/Desktop/warrenmayer-best-single/%(title)s-%(id)s/%(title)s-%(id)s.%(ext)s' \
https://www.youtube.com/user/warrenmayer/