Skip to content

Instantly share code, notes, and snippets.

@denisnazarov
Last active October 30, 2016 23:22
Show Gist options
  • Save denisnazarov/f990f0e7367b38afa2bad351ee26d9a9 to your computer and use it in GitHub Desktop.
Save denisnazarov/f990f0e7367b38afa2bad351ee26d9a9 to your computer and use it in GitHub Desktop.

MoMA ingestion

Download Artworks.json and Artists.json from https://github.com/MuseumofModernArt/collection.

First, pre-process Artworks.json with jq:

  • convert Artworks.json to newline-delimited json records:
    • jq -c '.[]' Artworks.json > Artworks.ndjson
  • Add a MediachainWKI field to each record, based on the ObjectID field. The original ObjectID field is unchanged.
    • jq -c '.MediachainWKI = (.ObjectID | tostring | "moma:artworks:" + .)'
  • Add an ArtistMediachainWKIs field that maps ConstituentID entries to mediachain WKIs:
    • jq -c '.ArtistMediachainWKIs = (.ConstituentID | map(tostring | "moma:artists:" + .))

Those three steps can be combined into a single command, to get the whole file ready for ingestion: jq -c '.[] | .MediachainWKI = (.ObjectID | tostring | ("moma:artwork:" + .)) | .ArtistMediachainWKIs = (.ConstituentID | map(tostring | "moma:artists:" + .))' Artworks.json > Artworks-Mediachain.ndjson

Next, pass the Artworks-Mediachain.ndjson file to the mcclient command:

  • mcclient publish --idSelector MediachainWKI images.moma Artworks-Mediachain.ndjson

Do a simlar jq preprocess step for Artists.json:

  • jq -c '.[] | .MediachainWKI = (.ConstituentID | tostring | "moma:artist:" + .)' ./Artists.json > Artists-Mediachain.ndjson

Then publish Artists-Mediachain.ndjson

  • mcclient publish --idSelector MediachainWKI images.moma.artists Artists-Mediachain.ndjson
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment