Skip to content

Instantly share code, notes, and snippets.

@aanastasiou
Created March 9, 2023 11:09
Show Gist options
  • Save aanastasiou/75ea710b15e1bf9359858a6597262454 to your computer and use it in GitHub Desktop.
Save aanastasiou/75ea710b15e1bf9359858a6597262454 to your computer and use it in GitHub Desktop.
Code to locate the specific "Washington, D.C." entry that causes a constrain validation upon ingesting ROR.
"""
A brief script to indicate the "location" of a possibly misspelled 'Washington'
in the current (v1.20-2023-02-28-ror-data.json) ROR dataset
:author: Athanasios Anastasiou
:date: Mar 2023
"""
import json
if __name__ == "__main__":
data_file = "v1.20-2023-02-28-ror-data.json"
# Load the data file
with open(data_file, "r") as fd:
data = json.load(fd)
# Get all addresses[].geonames_city.id and name for 4140963
q = list(filter(lambda x:4140963 in list(map(lambda y:y["geonames_city"]["id"] if "id" in y["geonames_city"] else "",x["addresses"])), data))
# Isolate the id and city attributes from the rest of the data structure
z = list(map(lambda x:list(map(lambda y:(x["id"], y["geonames_city"]["id"], y["geonames_city"]["city"]), x["addresses"]))[0] ,q))
# The following step should return only one entry if it is unique.
# Unfortunately it returns two, which is why the db constrain in my system fails.
print(set(map(lambda x:(x[1], x[2]), z)))
# Now go back and search for that entry that has that misspelled "Washington"
f = list(filter(lambda x: x[1]==4140963 and x[2]=="Washington", z))
print(f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment