Skip to content

Instantly share code, notes, and snippets.

View EdWarga's full-sized avatar

Ed Warga EdWarga

  • Austin, TX
View GitHub Profile
@EdWarga
EdWarga / GetOAI-PMH.xqy
Last active June 6, 2018 15:28
Code Examples to support proposal for the 2018 Open Repositories conference. See files below for example scripts created for repository management activities. These are simple ad hoc examples created by a non-developer library professional. They have been coded using BaseX software and may require that software to run.
(:code snippet originally created for: Anderson, C., Stringer-Hye, & Warga, E. (2017). graphs-without-ontologies: Data and Code for 2015 VIVO Conference Presentation. XQuery, Heard Library. Retrieved from https://github.com/HeardLibrary/graphs-without-ontologies
copied from: https://github.com/HeardLibrary/graphs-without-ontologies/blob/master/XQuery/get-OAI-data.xq
This script will harvest metadata from an OAI-PMH content provider. Update the base URL, setSpec, and metadata format to target the repository, collection, and particular metadata format desired.
I have been using this script to harvest the DSpace METS records for items in test collections to perform quality control checks. See the qualityCheckQueries.xqy file to see queries to check certain aspects of item metadata stored in the PREMIS section of the DSPace METS documents.
:)
xquery version "3.1";
@EdWarga
EdWarga / getDivs.py
Created May 25, 2018 20:55
python bits!
from bs4 import BeautifulSoup
import requests
url = "https://www.wunderground.com/weather/us/tx/corpus-christi"
r = requests.get(url)
data = r.text
# Read parent CSV
$InputFilename = Get-Content 'C:\working\splitCSV\testSplit.csv'
$OutputFilenamePattern = 'output_done_'
$LineLimit = 2000
# Initialize
$line = 0
$i = 0
$file = 0
$start = 0