Skip to content

Instantly share code, notes, and snippets.

@carwash
Last active August 5, 2020 09:04
Show Gist options
  • Save carwash/ecc888f2a273715d21119a6c70710b76 to your computer and use it in GitHub Desktop.
Save carwash/ecc888f2a273715d21119a6c70710b76 to your computer and use it in GitHub Desktop.
Minimal OAI-PMH harvester, for testing in bash. Uses curl, tee, and xmlstarlet for resumption token support. Usage: `$ oai-pmh-test.sh example.com`
#/usr/bin/env bash
fetch_token() {
file=$1 ; count=$2
tee >(xmlstarlet fo - > "$file-$(printf "%04d" $count).xml") | xmlstarlet sel -N oai="http://www.openarchives.org/OAI/2.0/" --template --match "/oai:OAI-PMH/oai:ListRecords/oai:resumptionToken" --value-of "."
}
server=$1
file=$(echo $server | sed -E 's/^https?:\/\///; s/[\.:\/]/-/g')
count=1
token=$(curl -s -g "$server?verb=ListRecords" -H 'Accept: application/xml' | fetch_token $file $count)
#token=$(curl -s -g "$server?verb=ListRecords" | fetch_token $file $count)
while [ $token ] ; do
((count++))
token=$(curl -s -g "$server?verb=ListRecords&resumptionToken=$token" -H 'Accept: application/xml' | fetch_token $file $count)
# token=$(curl -s -g "$server?verb=ListRecords&resumptionToken=$token" | fetch_token $file $count)
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment