Skip to content

Instantly share code, notes, and snippets.

@simonw
simonw / type_tracker.py
Created Jun 15, 2021
Experiment to guess the types of CSV data
View type_tracker.py
import csv, re
def is_float(s):
try:
float(s)
return True
except ValueError:
return False
@simonw
simonw / README.md
Created Jun 14, 2021
alnum encoding scheme
View README.md

alnum encoding scheme

The goal is to be able to take any Python string and reversibly convert it into a string that consists only of a-zA-Z9-0_ characters.

>>> alnum_encode("hello.csv")
'hello_2e_csv'
>>> alnum_encode("this é has ü accents")
'this_20__e9__20_has_20__fc__20_accents'
@simonw
simonw / auto-fill-slug-from-title.js
Created May 8, 2021
Pre-fill the slug form field based on the title, but stop doing that if the slug is manually edited
View auto-fill-slug-from-title.js
function slugify(s) {
return s
.toLowerCase()
.replace(/[^-\w\s]/g, "") // remove non-alphanumerics
.trim()
.replace(/[-\s]+/g, "-") // spaces to hyphens
.replace(/-+$/g, ""); // trim trailing hyphens
}
function slugAutoFill() {
@simonw
simonw / jq-helicopter.md
Created Mar 12, 2021
jq to reshape a helicopter trace
View jq-helicopter.md
View gist:2a4ceeb6a70c591bfabafe93ea1df249
[
{
"name": "Alabama",
"abbreviation": "AL",
"fips": "01"
},
{
"name": "Alaska",
"abbreviation": "AK",
"fips": "02"
@simonw
simonw / all-geocodes-v2018.csv
Created Mar 8, 2021
Converted from the "2018 State, County, Minor Civil Division, and Incorporated Place FIPS Codes" all-geocodes-v2018.xlsx file listed on https://www.census.gov/geographies/reference-files/2018/demo/popest/2018-fips.html
View all-geocodes-v2018.csv
We can't make this file beautiful and searchable because it's too large.
Summary Level,State Code (FIPS),County Code (FIPS),County Subdivision Code (FIPS),Place Code (FIPS),Consolidtated City Code (FIPS),Area Name (including legal/statistical area description)
010,00,000,00000,00000,00000,United States
040,01,000,00000,00000,00000,Alabama
050,01,001,00000,00000,00000,Autauga County
050,01,003,00000,00000,00000,Baldwin County
050,01,005,00000,00000,00000,Barbour County
050,01,007,00000,00000,00000,Bibb County
050,01,009,00000,00000,00000,Blount County
050,01,011,00000,00000,00000,Bullock County
@simonw
simonw / thoughts-on-scrapers.md
Last active Feb 26, 2021
Thoughts on scrapers
View thoughts-on-scrapers.md

Thoughts on scrapers

I really like the "Git scraping" pattern - where scrapers run inside GitHub Actions and update pretty-printed JSON files on disk, which are then committed back to the repo. More on that here: https://simonwillison.net/2020/Oct/9/git-scraping/

These are really easy to write - you just need code that produces the JSON. Sometimes that's as simple as curl -o something.json https://some-api.com/some-path.json

Then you run a script which commits and pushes the file but ONLY if that file has changed. I wrote notes on doing that here: https://til.simonwillison.net/github-actions/commit-if-file-changed

Short version of that:

@simonw
simonw / async-di Dependency injection for asyncio concurrency.ipynb
Last active Feb 20, 2021
Dependency injection for asyncio concurrency
View async-di Dependency injection for asyncio concurrency.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@simonw
simonw / toga_webview.py
Created Dec 24, 2020
Toga WebView example
View toga_webview.py
import toga
from toga.style import Pack
from toga.style.pack import COLUMN, ROW
class HelloWorld(toga.App):
def startup(self):
main_box = toga.Box(style=Pack(direction=COLUMN))
name_label = toga.Label("Your name: ", style=Pack(padding=(0, 5)))
@simonw
simonw / fetch_plugins.sh
Created Dec 8, 2020
Fetch a newline-separated list of Datasette plugins tagged datasette-io and datasette-plugin using curl
View fetch_plugins.sh
curl -s "https://github-to-sqlite.dogsheep.net/github.csv?sql=$(echo '
select
full_name
from
repos
where
rowid in (
select
repos.rowid
from