Skip to content

Instantly share code, notes, and snippets.

@mikesname
Last active March 24, 2017 14:42
Show Gist options
  • Save mikesname/482519184a40f1ef2e8643f15d86a79c to your computer and use it in GitHub Desktop.
Save mikesname/482519184a40f1ef2e8643f15d86a79c to your computer and use it in GitHub Desktop.
# First, log into the EHRI staging server
# Actually open a bunch of shells
# In one of them, tail the following file, which will give us some information
# about what goes wrong when something inevitably goes wrong
tail -f /opt/webapps/neo4j-version/data/log/console.log
# Next, in another shell, copy the file(s) to be ingested to the server
# and place them in /opt/webapps/data/import-data/de/de-002409
# (de-002409 is ITS's EHRI ID.)
# Errors: certain date patterns are fuzzy parsed by the importer. Invalid
# dates such as 31st April will currently throw a runtime exception. So
# fix all these first ;)
# Import properties handle cerain mappings between tags (with particular
# attributes) and EHRI fields. The ITS data has a particular mapping
# indicating that when the <unitid> has a type="refcode" that is the
# main ID, and the rest are the alternates. This file is:
# /opt/webapps/data/import-data/de/de-002409/its-pertinence.properties
# The actual import is done via the /ehri/import/ead endpoint on the
# Neo4j extension. It is documented here:
# http://ehri.github.io/docs/api/ehri-rest/ehri-extension/wsdocs/resource_ImportResource.html
# Lets export that as an ENV_VAR:
export PROPERTIES=/opt/webapps/data/import-data/de/de-002409/its-pertinence.properties
# Also, lets write a log file:
echo "Importing ITS data with properties: $PROPERTIES" > LOG.txt
export LOG=`pwd`/LOG.txt
# So to import a single XML, the command would be:
curl -XPOST \
-H "X-User:mike" \
-H "Content-type: text/xml" \
--data-binary @KHSK.xml_GER.xml \
"http://localhost:7474/ehri/import/ead?scope=de-002409&log=$LOG&properties=$PROPERTIES"
# If this happens to run out of Java Heap space, you can stop/start Neo4j like so:
sudo service neo4j-service restart
# (You can give Neo4j more memory by uncommenting and setting
# wrapper.java.maxmemory=4000 in $NEO4J_HOME/conf/neo4j-wrapper.conf
# If all goes well you should get something like this:
# {"created":48430,"unchanged":0,"message":"Import ITS 0.4 data using its-pertinence.properties.\n","updated":0,"errors":{}}
# In theory, that ingest should be idemotent, so you can run the same command again
# and not change anything:
# {"created":0,"unchanged":48430,"message":"Import ITS 0.4 data using its-pertinence.properties.\n","updated":0,"errors":{}}
# The final step is the re-index the ITS repository, making the items searchable. This
# can be done from the Portal Admin UI, or via the following command:
java -jar /opt/webapps/docview/bin/indexer.jar \
--clear-key-value holderId=de-002409 \
--index -H "X-User=admin" \
--stats \
--solr http://localhost:8983/solr/portal \
--rest http://localhost:7474/ehri \
"Repository|de-002409"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment