denisnazarov/moma-ingestion.md Secret

## moma-ingestion.md

      
    Raw
  

              moma-ingestion.md
            
          
    MoMA ingestion

Download Artworks.json and Artists.json from https://github.com/MuseumofModernArt/collection.
First, pre-process Artworks.json with jq:

convert Artworks.json to newline-delimited json records:

jq -c '.[]' Artworks.json > Artworks.ndjson


Add a MediachainWKI field to each record, based on the ObjectID field.  The original ObjectID field is unchanged.

jq -c '.MediachainWKI = (.ObjectID | tostring | "moma:artworks:" + .)'


Add an ArtistMediachainWKIs field that maps ConstituentID entries to mediachain WKIs:

jq -c '.ArtistMediachainWKIs = (.ConstituentID | map(tostring | "moma:artists:" + .))


Those three steps can be combined into a single command, to get the whole file ready for ingestion:
jq -c '.[] | .MediachainWKI = (.ObjectID | tostring | ("moma:artwork:" + .)) | .ArtistMediachainWKIs = (.ConstituentID | map(tostring | "moma:artists:" + .))' Artworks.json > Artworks-Mediachain.ndjson
Next, pass the Artworks-Mediachain.ndjson file to the mcclient command:

mcclient publish --idSelector MediachainWKI images.moma Artworks-Mediachain.ndjson

Do a simlar jq preprocess step for Artists.json:

jq -c '.[] | .MediachainWKI = (.ConstituentID | tostring | "moma:artist:" + .)' ./Artists.json > Artists-Mediachain.ndjson

Then publish Artists-Mediachain.ndjson

mcclient publish --idSelector MediachainWKI images.moma.artists Artists-Mediachain.ndjson