Skip to content

Instantly share code, notes, and snippets.

joewiz / get-groups-result.xml
Created Mar 18, 2012
Wrangling plain text with XQuery
View get-groups-result.xml
<line level="0">The President left at 8:48 am</line>
<line level="1">-Administration recommendations on Capitol Hill</line>
<line level="1">-Improvements</line>
View collection.xconf.xml
<collection xmlns="">
<!-- Old full text index configuration. Deprecated. -->
<fulltext default="none" attributes="false"/>
<!-- New full text index based on Lucene -->
<text qname="SPEECH">
<ignore qname="SPEAKER"/>
joewiz / tokenize-sentences.xq
Last active Apr 9, 2021
Split (or "tokenize") a string into "sentences", with XQuery. See
View tokenize-sentences.xq
xquery version "1.0";
(: A naive approach to sentence tokenization inspired by
: Works well with edited text like newspapers. Parameters like punctuation can/should be edited;
: see the section below called "criteria".
: For a more sophisticated approach, see Tibor Kiss and Jan Strunk, "Unsupervised Multilingual
: Sentence Boundary Detection", Computational Linguistics, Volume 32, Issue 4, December 2006,
: pp. 485-525. Also, see these discussions of sentence tokenization:
joewiz / trim-phrase-to-length.xq
Last active Mar 3, 2016
Trim phrases of arbitrary length to a maximum length, without cutting off words or ending on unwanted words, with XQuery
View trim-phrase-to-length.xq
xquery version "3.0";
declare function local:trim-phrase-to-length($phrase, $length) {
(: if the phrase is already short enough, we're done :)
if (string-length($phrase) le $length) then
(: the phrase is too long, so... :)
(: we will split the phrase into words and look for the longest possible arrangement within our length limit,
that doesn't end with boring words :)
joewiz / oauth.xq
Last active Nov 1, 2018
Access OAuth 1.0-based services like the Twitter v1.1 API, with XQuery. (See comments below for explanation.)
View oauth.xq
xquery version "3.0";
module namespace oauth="";
(:~ A library module for signing and submitting OAuth requests such as the kind needed for the Twitter v1.1 API.
The EXPath Crypto library supplies the HMAC-SHA1 algorithm. The EXPath HTTP Client library makes the HTTP requests.
The OAuth standard requires a "nonce" parameter - a random string. Since there is no implementation-independent
nonce function in XQuery, we must rely on implementation-specific functions. For eXist-db we use util:uuid().
joewiz / highlight-matches.xq
Last active Mar 22, 2017
Highlight regex pattern matches in XML while preserving node structure, with XQuery
View highlight-matches.xq
xquery version "3.0";
declare namespace fn="";
(: Search within $nodes for matches to a regular expression $pattern and apply a $highlight function :)
declare function local:highlight-matches($nodes as node()*, $pattern as xs:string, $highlight as function(xs:string) as item()* ) {
for $node in $nodes
typeswitch ( $node )
case element() return
joewiz / http-download.xq
Last active Sep 30, 2017
Download a file via HTTP and save to an eXist-db collection; uses EXPath modules where possible
View http-download.xq
xquery version "3.1";
import module namespace hc="";
import module namespace util="";
import module namespace xmldb="";
(: downloads a file from a remote HTTP server at $file-url and save it to an eXist-db $collection.
: we try hard to recognize XML files and save them with the correct mimetype so that eXist-db can
: efficiently index and query the files; if it doesn't appear to be XML, though, we just trust
: the response headers :)
joewiz / html5-serialization-prolog.xq
Created Jul 9, 2013
XQuery 3.0 HTML5 Serialization Prolog
View html5-serialization-prolog.xq
xquery version "3.0";
declare namespace output="";
declare option output:method "html5";
declare option output:media-type "text/html";
joewiz / principal-officers-since-carter.xq
Last active Dec 20, 2015
Principal Officers who were serving on, or began service after, January 20, 1977, and who are still alive; using XQuery
View principal-officers-since-carter.xq
xquery version "3.0";
(: Display a list of Principal Officers who are still alive who began serving on/after, or were serving as of, January 20, 1977. :)
let $all-people := collection('/db/cms/apps/principals-chiefs/data/')/person
let $principals := $all-people[.//role/@class='principal']
let $cutoff-date := '1977-01-20'
let $since-cutoff := $principals//role[@class='principal'][event[@type=('appointed', 'appointterminated')]/@when ge $cutoff-date]/ancestor::person
let $still-living := $since-cutoff[death/@type = 'unknown' or death = '' or empty(death/node())]
joewiz / fix-name-capitalization.xq
Last active Dec 20, 2015
Fix problems with mis-capitalized names, with XQuery
View fix-name-capitalization.xq
xquery version "3.0";
declare namespace fn="";
(: Fix problems with mis-capitalized names. For example:
Before: MACARTHUR, Douglas II
After: MacArthur, Douglas II
declare function local:fix-name-capitalization($name as xs:string) {