Skip to content

Instantly share code, notes, and snippets.

@nrenner
Last active September 4, 2019 09:35
Show Gist options
  • Save nrenner/01390ba2067f5a8f34440c20e4717f87 to your computer and use it in GitHub Desktop.
Save nrenner/01390ba2067f5a8f34440c20e4717f87 to your computer and use it in GitHub Desktop.
semicolon.awk - find MAPS.ME edits that removed elements from multi value list

semicolon.awk

Finds MAPS.ME edits that removed elements from multi value list, i.e. tags with multiple values separated by semicolon (";").

Uses changesets dump from planet.openstreetmap.org and full history extract from Geofabrik, both pre-filtered and converted to the Osmium Tool OPL Format.

To get edits by MAPS.ME, the changesets dump is filtered by the created_by tag and later matched by changeset ID with the full history file.

Requires:

  • Osmium Tool >= 1.7.0 (because of bbox for changeset-filter), tested with v1.10.0 (built from source)
  • MAWK variant of AWK, as provided with Ubuntu 18.04

download

prepare

  • changesets
    osmium changeset-filter --after 2016-01-01T00:00:00Z --bbox 5.864417,47.26543,15.05078,55.14777 changesets-190826.osm.bz2 -o changesets.opl
    
    grep 'created_by=MAPS.ME' changesets.opl > changesets-mapsme.opl
    
  • full history extract (Germany)
    osmium tags-filter -R -o opening_hours.osh.opl germany-internal.osh.pbf opening_hours
    osmium tags-filter -R -o level.osh.opl germany-internal.osh.pbf level
    

run

awk -f semicolon.awk -v include=level changesets-mapsme.opl level.osh.opl | sed -e 's/%20%/ /g' -e 's/%2c%/,/g' > level.md

awk -f semicolon.awk -v include=opening_hours changesets-mapsme.opl opening_hours.osh.opl | sed -e 's/%20%/ /g' -e 's/%2c%/,/g' > opening_hours.md
# AWK¹ script to print edits that removed elements from multi value list (;),
# filtered by changesets
#
# Expects changesets dump and full history OSM files in the Osmium Tool OPL format as input:
#
# awk -f semicolon.awk changesets.opl full-history.osh.opl
# awk -f semicolon.awk -v include=level mapsme-changesets.opl level.osh.opl | sed -e 's/%20%/ /g' -e 's/%2c%/,/g' > level.md
#
# Filter by tag key with variable assignment "include" or "exclude" in command-line,
# e.g. -v include="level,opening_hours"
#
# ¹ MAWK variant on Ubuntu 18.04
BEGIN {
OFS = ", "
split(include, tmp);
for (i in tmp) includeMap[tmp[i]];
split(exclude, tmp);
for (i in tmp) excludeMap[tmp[i]];
}
# read changesets file (first parameter) into id map
NR==FNR {
getTags($12, csTags);
changesets[$1] = csTags["created_by"];
next
}
$4 in changesets && prevId == $1 && $3 != "dD" && prevTagsStr ~ /;/ {
getTags(prevTagsStr, prevTags);
# iterate over current tags
split(substr($8, 2), tagsList, ",");
for(tagStr in tagsList) {
split(tagsList[tagStr], keyVal, "=");
key = keyVal[1];
val = keyVal[2];
if ((!include || key in includeMap) && (!exclude || !(key in excludeMap))) {
valNum = split(val, valArr, /(%20%)*;(%20%)*/);
prevVal = prevTags[key];
prevValNum = split(prevVal, prevValArr, /(%20%)*;(%20%)*/);
# test if value list elements got removed (new value list is subset of old, order doesn't matter)
if (prevValNum > valNum && (containsAll(prevValArr, valArr) || index(prevVal, val) > 0)) {
typeMap["n"] = "node";
typeMap["w"] = "way";
typeMap["r"] = "relation";
type = typeMap[substr($1, 1, 1)];
id = substr($1, 2);
obj = "[" $1 "](https://www.openstreetmap.org/" type "/" id ")";
deepHist = "[DH](http://osmlab.github.io/osm-deep-history/#/" type "/" id ")";
histViewer = "[HV](https://pewu.github.io/osm-history/#/" type "/" id ")";
timestamp = substr($5, 2);
createdBy = changesets[$4];
sub("MAPS.ME%20%", "", createdBy);
#print $1,$2,$4,$5,$7,key,prevVal,val;
print "- " obj " " $2,timestamp,key "=  (" deepHist ", " histViewer "; " createdBy ") \n `" prevVal "` \n `" val "`";
}
}
}
delete prevTags;
}
{prevId=$1; prevTagsStr=$8}
# build tags map (key->value) from string
function getTags(str, tags, tagsList, tagStr, keyVal) {
split(substr(str, 2), tagsList, ",");
for(tagStr in tagsList) {
split(tagsList[tagStr], keyVal, "=");
tags[keyVal[1]] = keyVal[2];
}
}
# tests if all elements of array arr2 are in array arr1
function containsAll(arr1, arr2, i, a, b, k, len) {
for (i in arr1) a[i] = arr1[i];
for (i in arr2) b[i] = arr2[i];
for (i in b) {
for (k in a) {
if (b[i] == a[k]) {
delete b[i];
delete a[k];
break;
}
}
}
for (i in b) {
len++;
}
return len == 0;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment