Instantly share code, notes, and snippets.

Embed
What would you like to do?
Highlight regex pattern matches in XML while preserving node structure, with XQuery
xquery version "3.0";
declare namespace fn="http://www.w3.org/2005/xpath-functions";
(: Search within $nodes for matches to a regular expression $pattern and apply a $highlight function :)
declare function local:highlight-matches($nodes as node()*, $pattern as xs:string, $highlight as function(xs:string) as item()* ) {
for $node in $nodes
return
typeswitch ( $node )
case element() return
element { name($node) } { $node/@*, local:highlight-matches($node/node(), $pattern, $highlight) }
case text() return
let $normalized := replace($node, '\s+', ' ')
for $segment in analyze-string($normalized, $pattern)/node()
return
if ($segment instance of element(fn:match)) then
$highlight($segment/string())
else
$segment/string()
case document-node() return
document { local:highlight-matches($node/node(), $pattern, $highlight) }
default return
$node
};
let $node :=
<article>
<h1>Introduction</h1>
<p>Higher-order functions are probably the most notable addition to the XQuery language in
version 3.0 of the <a href="http://www.w3.org/TR/xquery-30/">specification</a>. While it may
take some time to understand their full impact, higher-order functions certainly open a wide
range of new possibilities, and are a key feature in all functional languages.</p>
<p>As of April 2012, eXist-db completely supports higher-order functions, including features
like inline functions, closures and partial function application. This article will quickly
walk through each feature before we put them all together in a practical example.</p>
<section>
<h1>Function References</h1>
<p>A higher-order function is a function which takes another function as parameter or
returns a function. So the first thing you'll need in order to pass a function around is
a way to obtain a reference to a function.</p>
<p>In older versions of eXist-db we had an extension function for this, called
util:function, which expected a name as first argument, and the <em>arity</em> of the
function as second. The <em>arity</em> corresponds to the
<sub>n<b>u</b>m<em>b</em>er</sub> of parameters the target function takes. Name and
arity are required to uniquely identify a function within a module.</p>
<p>XQuery 3.0 now provides a <a href="http://www.w3.org/TR/xquery-30/#id-named-function-ref"
>literal syntax</a> for referencing a function statically. It also consists of the
name and the arity of the function to look up, separated by a hash sign:</p>
<div class="code" data-language="xquery">let $f := my:func#2</div>
</section>
</article>
let $pattern := '[Ff]un[a-z]+'
let $highlight := function($string as xs:string) { <span class="highlight">{$string}</span> }
return
local:highlight-matches($node, $pattern, $highlight)
<!--
<summary>
<pattern>[Ff]un[a-z]+</pattern>
<match n="1">func</match>
<match n="14">function</match>
<match n="1">Function</match>
<match n="1">functional</match>
<match n="4">functions</match>
</summary>
-->
<article>
<h1>Introduction</h1>
<p>Higher-order <span class="highlight">functions</span> are probably the most notable addition
to the XQuery language in version 3.0 of the <a href="http://www.w3.org/TR/xquery-30/"
>specification</a>. While it may take some time to understand their full impact,
higher-order <span class="highlight">functions</span> certainly open a wide range of new
possibilities, and are a key feature in all <span class="highlight">functional</span>
languages.</p>
<p>As of April 2012, eXist-db completely supports higher-order <span class="highlight"
>functions,</span> including features like inline <span class="highlight"
>functions,</span> closures and partial <span class="highlight">function</span>
application. This article will quickly walk through each feature before we put them all
together in a practical example.</p>
<section>
<h1>
<span class="highlight">Function</span> References</h1>
<p>A higher-order <span class="highlight">function</span> is a <span class="highlight"
>function</span> which takes another <span class="highlight">function</span> as
parameter or returns a <span class="highlight">function.</span> So the first thing
you'll need in order to pass a <span class="highlight">function</span> around is a way
to obtain a reference to a <span class="highlight">function.</span>
</p>
<p>In older versions of eXist-db we had an extension <span class="highlight">function</span>
for this, called util:<span class="highlight">function,</span> which expected a name as
first argument, and the <em>arity</em> of the <span class="highlight">function</span> as
second. The <em>arity</em> corresponds to the <sub>n<b>u</b>m<em>b</em>er</sub> of
parameters the target <span class="highlight">function</span> takes. Name and arity are
required to uniquely identify a <span class="highlight">function</span> within a
module.</p>
<p>XQuery 3.0 now provides a <a href="http://www.w3.org/TR/xquery-30/#id-named-function-ref"
>literal syntax</a> for referencing a <span class="highlight">function</span>
statically. It also consists of the name and the arity of the <span class="highlight"
>function</span> to look up, separated by a hash sign:</p>
<div class="code" data-language="xquery">let $f := my:<span class="highlight">func#2</span>
</div>
</section>
</article>
@joewiz

This comment has been minimized.

Owner

joewiz commented Jul 5, 2013

The XQuery functions contains() and matches() are great at finding strings and patterns, but XQuery lacks a built-in function for highlighting the results of matches. The XQuery 3.0 function analyze-string(), which splits a string into matching and non-matching segments, comes close, but it only operates on strings, not on XML nodes. This function harnesses analyze-string() to highlight pattern matches in XML nodes in memory. Note that it still only searches within individual text nodes inside the string and doesn't span non-text nodes.

@joewiz

This comment has been minimized.

Owner

joewiz commented Jul 6, 2013

There was a limitation in the first version of the highlight-matches() function. It could only highlight matches with a single element. I realized the function would be more powerful if it could highlight matches with a function. Taking advantage of eXist-db's support for XQuery 3.0's higher order functions, I revised highlight-matches() to take a function as its 3rd parameter. I revised the example to define an inline function.

@joewiz

This comment has been minimized.

Owner

joewiz commented Jul 6, 2013

For some background and commentary, see my post on Tumblr, XQuery's Missing Third Function.

@joewiz

This comment has been minimized.

Owner

joewiz commented Mar 22, 2017

My old Tumblr blog posts are now on my personal blog, http://joewiz.org/2013/07/06/xquerys-missing-third-function/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment