Created
February 9, 2015 00:47
-
-
Save CJHArch/225cfa99afda9fe73619 to your computer and use it in GitHub Desktop.
OAI reporting - proof of concept
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# This script will pull the oai feed, assign it a name based on the time and date, pull out requested data, and | |
mkdir $(date +%Y%m%d) | |
cd $(date +%Y%m%d) | |
echo "Files found in" $(date +%Y%m%d) | |
#run pythonaoi to get feed | |
python /home/kevin/test/pyoaiharvest.py -l http://digital.cjh.org/OAI-PUB -o LBI_periodicals$(date +%Y%m%d).xml -m marc21 -s LBI_periodicals | |
# sed to remove marc namespace prefix, consider using the sed 'or' to clean up other stuff in one shot | |
# see http://sed.sourceforge.net/sed1line.txt for examples | |
# sed "s/marc://g" | |
# sed to remove everything between '<record ' and '>' | |
sed -i -e 's/\(<record \).*\(>\)/\1\2/' LBI_periodicals$(date +%Y%m%d).xml | |
#any other text fixes necessary | |
#xmlstarlet to run stylesheet on file, outputting required data in one or more files | |
xmlstarlet tr /home/kevin/test/identifiers.xsl LBI_periodicals$(date +%Y%m%d).xml > /home/kevin/test/finalreport.xml | |
# mail command to send file to required recipients |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This script will pull down the OAI feed, remove the Marc namespace declarations (this still needs to be tested for the full feed which has wacky NS there); run it through a stylesheet using xmlstarlet; and write out to a report. This works! Next job is writing a stylesheet to get all the requested data out.