Skip to content

Instantly share code, notes, and snippets.

@CJHArch
Created February 9, 2015 00:47
Show Gist options
  • Save CJHArch/225cfa99afda9fe73619 to your computer and use it in GitHub Desktop.
Save CJHArch/225cfa99afda9fe73619 to your computer and use it in GitHub Desktop.
OAI reporting - proof of concept
#!/bin/bash
# This script will pull the oai feed, assign it a name based on the time and date, pull out requested data, and
mkdir $(date +%Y%m%d)
cd $(date +%Y%m%d)
echo "Files found in" $(date +%Y%m%d)
#run pythonaoi to get feed
python /home/kevin/test/pyoaiharvest.py -l http://digital.cjh.org/OAI-PUB -o LBI_periodicals$(date +%Y%m%d).xml -m marc21 -s LBI_periodicals
# sed to remove marc namespace prefix, consider using the sed 'or' to clean up other stuff in one shot
# see http://sed.sourceforge.net/sed1line.txt for examples
# sed "s/marc://g"
# sed to remove everything between '<record ' and '>'
sed -i -e 's/\(<record \).*\(>\)/\1\2/' LBI_periodicals$(date +%Y%m%d).xml
#any other text fixes necessary
#xmlstarlet to run stylesheet on file, outputting required data in one or more files
xmlstarlet tr /home/kevin/test/identifiers.xsl LBI_periodicals$(date +%Y%m%d).xml > /home/kevin/test/finalreport.xml
# mail command to send file to required recipients
@CJHArch
Copy link
Author

CJHArch commented Feb 9, 2015

This script will pull down the OAI feed, remove the Marc namespace declarations (this still needs to be tested for the full feed which has wacky NS there); run it through a stylesheet using xmlstarlet; and write out to a report. This works! Next job is writing a stylesheet to get all the requested data out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment