Use this bash script and command to pull down all records from invenio and save them as discrete files within uniquely named directories
Create a file named invenio_recs.sh
with these contents:
#!/bin/bash
#create directory for records
mkdir $1
curl_resp1=`curl -k -X GET -H "Content-Type: application/json" -H "Accept: application/json" "https://invenio-test.rc.it.nyu.edu/api/records/?sort=mostrecent&size=1"`
id_rec=`echo $curl_resp1 | awk -F, '{ print $8 }'| awk -F: '{ print $2 }'`
id_rec_num=`echo ${id_rec:1:2}`
i=1
while [ $i -le $id_rec_num ]
do
#check if record exist
status_code=$(curl -k --write-out %{http_code} --silent --output /dev/null https://invenio-test.rc.it.nyu.edu/api/records/$i)
if [[ $status_code == 429 ]] ; then
sleep 60
fi
if [[ $status_code == 200 ]] ; then
#if record exists and is not deleted e.g. has metadata save it as json
curl_resp=`curl -k -X GET -H "Content-Type: application/json" -H "Accept: application/json" "https://invenio-test.rc.it.nyu.edu/api/records/$i?prettyprint=1"`
if ! [[ "$curl_resp" = *"metadata\": {}"* ]]; then
if ! [[ "$curl_resp" = *"message"* ]]; then
echo $curl_resp>$1/record_$i.json
else
echo "$curl_resp"
fi
fi
fi
let i=i+1
echo $i
done
Next, log on to the VPN, navigate to the place where you want to run your script and take down the records, run this command:
chmod 775 invenio_recs.sh
Then
bash ./invenio_recs.sh /Users/staff/Desktop/inveniopull
Where the the path is where you want the records to go. Note that the app throttles the downloads a bit, and you will be limited to 1000 requests per hour and only 30 in a single minute from the same IP address. You'll need to wait the appropriate amount of time. After downloading, you should rename each file according to the invenio ID and then pretty print the records before committing them. For now, this is a two step process:
Navigate to the directory where all of the reocrds are and oncatenate all individual JSONs into a single file with JS
find . -name '*.json' -exec cat '{}' + | jq -s '.' > /Users/staff/Desktop/newjsonsinglefile.json
Establish a Ruby session with irb
and paste in the following:
require 'json'
irb_context.echo = false
allrecords_file = File.read('/Users/staff/Desktop/newsinglefile.json')
## The file where the original .json that contains all of the records is above. Make sure to include the full path
parsed_file = JSON.parse(allrecords_file)
## The JSON.parse function parses the single file into discrete outputs as JSON files
parsed_file.each do |record|
folder_name = record['id']
full_folder = "/Users/andrewbattista/Desktop/inveniopull/revised"
`mkdir -p #{full_folder}/#{folder_name}`
File.open("#{full_folder}/#{folder_name}/invenio.json", "w") do |f|
f.write(JSON.pretty_generate(record))
end
end