Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
Highlight regex pattern matches in XML while preserving node structure, with XQuery
xquery version "3.0";
declare namespace fn="http://www.w3.org/2005/xpath-functions";
(: Search within $nodes for matches to a regular expression $pattern and apply a $highlight function :)
declare function local:highlight-matches($nodes as node()*, $pattern as xs:string, $highlight as function(xs:string) as item()* ) {
for $node in $nodes
return
typeswitch ( $node )
case element() return
element { name($node) } { $node/@*, local:highlight-matches($node/node(), $pattern, $highlight) }
case text() return
let $normalized := replace($node, '\s+', ' ')
for $segment in analyze-string($normalized, $pattern)/node()
return
if ($segment instance of element(fn:match)) then
$highlight($segment/string())
else
$segment/string()
case document-node() return
document { local:highlight-matches($node/node(), $pattern, $highlight) }
default return
$node
};
let $node :=
<article>
<h1>Introduction</h1>
<p>Higher-order functions are probably the most notable addition to the XQuery language in
version 3.0 of the <a href="http://www.w3.org/TR/xquery-30/">specification</a>. While it may
take some time to understand their full impact, higher-order functions certainly open a wide
range of new possibilities, and are a key feature in all functional languages.</p>
<p>As of April 2012, eXist-db completely supports higher-order functions, including features
like inline functions, closures and partial function application. This article will quickly
walk through each feature before we put them all together in a practical example.</p>
<section>
<h1>Function References</h1>
<p>A higher-order function is a function which takes another function as parameter or
returns a function. So the first thing you'll need in order to pass a function around is
a way to obtain a reference to a function.</p>
<p>In older versions of eXist-db we had an extension function for this, called
util:function, which expected a name as first argument, and the <em>arity</em> of the
function as second. The <em>arity</em> corresponds to the
<sub>n<b>u</b>m<em>b</em>er</sub> of parameters the target function takes. Name and
arity are required to uniquely identify a function within a module.</p>
<p>XQuery 3.0 now provides a <a href="http://www.w3.org/TR/xquery-30/#id-named-function-ref"
>literal syntax</a> for referencing a function statically. It also consists of the
name and the arity of the function to look up, separated by a hash sign:</p>
<div class="code" data-language="xquery">let $f := my:func#2</div>
</section>
</article>
let $pattern := '[Ff]un[a-z]+'
let $highlight := function($string as xs:string) { <span class="highlight">{$string}</span> }
return
local:highlight-matches($node, $pattern, $highlight)
<!--
<summary>
<pattern>[Ff]un[a-z]+</pattern>
<match n="1">func</match>
<match n="14">function</match>
<match n="1">Function</match>
<match n="1">functional</match>
<match n="4">functions</match>
</summary>
-->
<article>
<h1>Introduction</h1>
<p>Higher-order <span class="highlight">functions</span> are probably the most notable addition
to the XQuery language in version 3.0 of the <a href="http://www.w3.org/TR/xquery-30/"
>specification</a>. While it may take some time to understand their full impact,
higher-order <span class="highlight">functions</span> certainly open a wide range of new
possibilities, and are a key feature in all <span class="highlight">functional</span>
languages.</p>
<p>As of April 2012, eXist-db completely supports higher-order <span class="highlight"
>functions,</span> including features like inline <span class="highlight"
>functions,</span> closures and partial <span class="highlight">function</span>
application. This article will quickly walk through each feature before we put them all
together in a practical example.</p>
<section>
<h1>
<span class="highlight">Function</span> References</h1>
<p>A higher-order <span class="highlight">function</span> is a <span class="highlight"
>function</span> which takes another <span class="highlight">function</span> as
parameter or returns a <span class="highlight">function.</span> So the first thing
you'll need in order to pass a <span class="highlight">function</span> around is a way
to obtain a reference to a <span class="highlight">function.</span>
</p>
<p>In older versions of eXist-db we had an extension <span class="highlight">function</span>
for this, called util:<span class="highlight">function,</span> which expected a name as
first argument, and the <em>arity</em> of the <span class="highlight">function</span> as
second. The <em>arity</em> corresponds to the <sub>n<b>u</b>m<em>b</em>er</sub> of
parameters the target <span class="highlight">function</span> takes. Name and arity are
required to uniquely identify a <span class="highlight">function</span> within a
module.</p>
<p>XQuery 3.0 now provides a <a href="http://www.w3.org/TR/xquery-30/#id-named-function-ref"
>literal syntax</a> for referencing a <span class="highlight">function</span>
statically. It also consists of the name and the arity of the <span class="highlight"
>function</span> to look up, separated by a hash sign:</p>
<div class="code" data-language="xquery">let $f := my:<span class="highlight">func#2</span>
</div>
</section>
</article>
Owner

joewiz commented Jul 5, 2013

The XQuery functions contains() and matches() are great at finding strings and patterns, but XQuery lacks a built-in function for highlighting the results of matches. The XQuery 3.0 function analyze-string(), which splits a string into matching and non-matching segments, comes close, but it only operates on strings, not on XML nodes. This function harnesses analyze-string() to highlight pattern matches in XML nodes in memory. Note that it still only searches within individual text nodes inside the string and doesn't span non-text nodes.

Owner

joewiz commented Jul 6, 2013

There was a limitation in the first version of the highlight-matches() function. It could only highlight matches with a single element. I realized the function would be more powerful if it could highlight matches with a function. Taking advantage of eXist-db's support for XQuery 3.0's higher order functions, I revised highlight-matches() to take a function as its 3rd parameter. I revised the example to define an inline function.

Owner

joewiz commented Jul 6, 2013

For some background and commentary, see my post on Tumblr, XQuery's Missing Third Function.

Owner

joewiz commented Mar 22, 2017

My old Tumblr blog posts are now on my personal blog, http://joewiz.org/2013/07/06/xquerys-missing-third-function/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment