Skip to content

Instantly share code, notes, and snippets.

@CJHArch
Created January 12, 2015 21:24
Show Gist options
  • Save CJHArch/a1bc383fe350585b1b30 to your computer and use it in GitHub Desktop.
Save CJHArch/a1bc383fe350585b1b30 to your computer and use it in GitHub Desktop.
xQuery for finding collective access data from HTML scraped from Buffalo library XTF-hosted finding aids. Data was curl'ed into one large XML, and imported into BaseX with namespaces stripped.
xquery version "3.0";
<results>
{
for $findingaid in /records/html
let $title:= $findingaid/meta[@name="dc.title"]/@content
let $creator := $findingaid/meta[@name="dc.author"]/@content
let $subject := $findingaid/meta[@name="dc.subject"]/@content
let $dates:= $findingaid//h2[@class="tp_titleproper"]
let $abstract := $findingaid/head/meta[@name="description"]/@content
let $unitid := $findingaid//h3[@class="tp_collnum"]
let $link := $findingaid//div[@class="level-1"][1]/a/@href/substring-before(., "&amp;")
let $extent := $findingaid//div[span/a[@name="node.1.3.1.5"]]/text()
return
<record>
<title>{data($title)}</title>
<creator>{data($creator)}</creator>
<dates>{data(substring-after($dates, ","))}</dates>
<unitid>JPBUF_{data($unitid)}</unitid>
<publisher>Jewish Buffalo Archives Project, at University Archives, University at Buffalo</publisher>
<abstract>{data($abstract)}</abstract>
<link>{data($link)}</link>
<extent>{$extent}</extent>
<subjects>{data($subject)}</subjects>
</record>
}
</results>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment