Skip to content

Instantly share code, notes, and snippets.

@dkam
Last active January 22, 2024 05:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dkam/5725c01173a6fa71f7f80c0e08605f96 to your computer and use it in GitHub Desktop.
Save dkam/5725c01173a6fa71f7f80c0e08605f96 to your computer and use it in GitHub Desktop.
Convert ISNI_persons.jsonld.gz into a JSONL file using command line tools sed and jq.
# https://isni.org/page/linked-data/
# https://isni.oclc.org:2443/isni/public_export/ISNI_persons.jsonld.gz
wget https://isni.oclc.org:2443/isni/public_export/ISNI_persons.jsonld.gz
# The file I downloaded was full of the 0x1E character, or ^^ in ASCII. This will strip that
sed 's/\x1E//g' ISNI_persons.jsonld > cleaned_ISNI_persons.jsonld
# Then use JQ to convert the file into the way more sane JSONL format. By default, it tries to read it all into
# memory - so you will need to use the streaming version I found from :
# https://stackoverflow.com/questions/49808581/using-jq-how-can-i-split-a-very-large-json-file-into-multiple-files-each-a-spec
jq -cn --stream 'fromstream(1|truncate_stream(inputs))' cleaned_ISNI_persons.jsonld > ISNI_persons.jsonl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment