Skip to content

Instantly share code, notes, and snippets.

@mnalis
Created February 13, 2024 15:17
Show Gist options
  • Save mnalis/0e808d847ccad8597a010e127bd5b889 to your computer and use it in GitHub Desktop.
Save mnalis/0e808d847ccad8597a010e127bd5b889 to your computer and use it in GitHub Desktop.
OSM housenumbers ending in "a" vs "b"/"c"/"d"/"e"/"f"
#!/bin/sh
# first step takes about 1h 45m on my 4core VM and produces about 300MB output file. The second step is about 2.5m
# it counts housenumbers ending in a number and a single letter (e.g. '42b') and outputs frequency of those letters a-f
# for testing https://github.com/streetcomplete/StreetComplete/issues/5479#issuecomment-1937812748
time pv -ptebar planet-240205.osm.bz2 | pbzip2 -dc | ag -F 'addr:housenumber' | zstd > housenumbers.xml.zstd
time pv housenumbers.xml.zstd | zstdmt -dc | perl -nE 'if (/^\s*<tag k="addr:housenumber"\sv="(.*?)\s*".*$/) { $n=$1; $a++ if $n=~/\da$/i; $b++ if $n=~/\db$/i; $c++ if $n=~/\dc$/i; $d++ if $n=~/\dd$/i; $e++ if $n=~/\de$/i; $f++ if $n=~/\df$/i;} END { say "a=$a\nb=$b\nc=$c\nd=$d\ne=$e\nf=$f\n" }'
# result:
#a=4593803
#b=1862098
#c=756913
#d=435247
#e=251648
#f=167140
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment