Skip to content

Instantly share code, notes, and snippets.

Joe Wicentowski joewiz

Block or report user

Report or block joewiz

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@joewiz
joewiz / get-groups-result.xml
Created Mar 18, 2012
Wrangling plain text with XQuery
View get-groups-result.xml
<group>
<line level="0">The President left at 8:48 am</line>
<group>
<group>
<line level="1">-Administration recommendations on Capitol Hill</line>
</group>
<group>
<line level="1">-Improvements</line>
</group>
<group>
View collection.xconf.xml
<collection xmlns="http://exist-db.org/collection-config/1.0">
<index>
<!-- Old full text index configuration. Deprecated. -->
<fulltext default="none" attributes="false"/>
<!-- New full text index based on Lucene -->
<lucene>
<text qname="SPEECH">
<ignore qname="SPEAKER"/>
</text>
@joewiz
joewiz / tokenize-sentences.xq
Last active Dec 19, 2015
Split (or "tokenize") a string into "sentences", with XQuery. See http://joewiz.org/2013/06/29/one-paragraph-many-sentences/.
View tokenize-sentences.xq
xquery version "1.0";
(: A naive approach to sentence tokenization inspired by http://stackoverflow.com/a/2103653/659732
:
: Works well with edited text like newspapers. Parameters like punctuation can/should be edited;
: see the section below called "criteria".
:
: For a more sophisticated approach, see Tibor Kiss and Jan Strunk, "Unsupervised Multilingual
: Sentence Boundary Detection", Computational Linguistics, Volume 32, Issue 4, December 2006,
: pp. 485-525. Also, see these discussions of sentence tokenization:
@joewiz
joewiz / trim-phrase-to-length.xq
Last active Mar 3, 2016
Trim phrases of arbitrary length to a maximum length, without cutting off words or ending on unwanted words, with XQuery
View trim-phrase-to-length.xq
xquery version "3.0";
declare function local:trim-phrase-to-length($phrase, $length) {
(: if the phrase is already short enough, we're done :)
if (string-length($phrase) le $length) then
$phrase
(: the phrase is too long, so... :)
else
(: we will split the phrase into words and look for the longest possible arrangement within our length limit,
that doesn't end with boring words :)
@joewiz
joewiz / oauth.xq
Last active Nov 1, 2018
Access OAuth 1.0-based services like the Twitter v1.1 API, with XQuery. (See comments below for explanation.)
View oauth.xq
xquery version "3.0";
module namespace oauth="http://history.state.gov/ns/xquery/oauth";
(:~ A library module for signing and submitting OAuth requests such as the kind needed for the Twitter v1.1 API.
The EXPath Crypto library supplies the HMAC-SHA1 algorithm. The EXPath HTTP Client library makes the HTTP requests.
The OAuth standard requires a "nonce" parameter - a random string. Since there is no implementation-independent
nonce function in XQuery, we must rely on implementation-specific functions. For eXist-db we use util:uuid().
@joewiz
joewiz / highlight-matches.xq
Last active Mar 22, 2017
Highlight regex pattern matches in XML while preserving node structure, with XQuery
View highlight-matches.xq
xquery version "3.0";
declare namespace fn="http://www.w3.org/2005/xpath-functions";
(: Search within $nodes for matches to a regular expression $pattern and apply a $highlight function :)
declare function local:highlight-matches($nodes as node()*, $pattern as xs:string, $highlight as function(xs:string) as item()* ) {
for $node in $nodes
return
typeswitch ( $node )
case element() return
@joewiz
joewiz / http-download.xq
Last active Sep 30, 2017
Download a file via HTTP and save to an eXist-db collection; uses EXPath modules where possible
View http-download.xq
xquery version "3.1";
import module namespace hc="http://expath.org/ns/http-client";
import module namespace util="http://exist-db.org/xquery/util";
import module namespace xmldb="http://exist-db.org/xquery/xmldb";
(: downloads a file from a remote HTTP server at $file-url and save it to an eXist-db $collection.
: we try hard to recognize XML files and save them with the correct mimetype so that eXist-db can
: efficiently index and query the files; if it doesn't appear to be XML, though, we just trust
: the response headers :)
@joewiz
joewiz / html5-serialization-prolog.xq
Created Jul 9, 2013
XQuery 3.0 HTML5 Serialization Prolog
View html5-serialization-prolog.xq
xquery version "3.0";
declare namespace output="http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method "html5";
declare option output:media-type "text/html";
@joewiz
joewiz / principal-officers-since-carter.xq
Last active Dec 20, 2015
Principal Officers who were serving on, or began service after, January 20, 1977, and who are still alive; using XQuery
View principal-officers-since-carter.xq
xquery version "3.0";
(: Display a list of Principal Officers who are still alive who began serving on/after, or were serving as of, January 20, 1977. :)
let $all-people := collection('/db/cms/apps/principals-chiefs/data/')/person
let $principals := $all-people[.//role/@class='principal']
let $cutoff-date := '1977-01-20'
let $since-cutoff := $principals//role[@class='principal'][event[@type=('appointed', 'appointterminated')]/@when ge $cutoff-date]/ancestor::person
let $still-living := $since-cutoff[death/@type = 'unknown' or death = '' or empty(death/node())]
return
@joewiz
joewiz / fix-name-capitalization.xq
Last active Dec 20, 2015
Fix problems with mis-capitalized names, with XQuery
View fix-name-capitalization.xq
xquery version "3.0";
declare namespace fn="http://www.w3.org/2005/xpath-functions";
(: Fix problems with mis-capitalized names. For example:
Before: MACARTHUR, Douglas II
After: MacArthur, Douglas II
:)
declare function local:fix-name-capitalization($name as xs:string) {
(:
You can’t perform that action at this time.