Last active
June 10, 2016 20:10
-
-
Save joewiz/2369367de3babba30e0aad8c9beec893 to your computer and use it in GitHub Desktop.
XQuery Update data corruption problem http://markmail.org/message/3fzcixmxeh76z6l3
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
xquery version "3.0"; | |
(: | |
Goal: Take a TEI document containing <ref> elements that need to be fixed, and fix these with XQuery Update. | |
Specifically, we find the page number references from the text node immediately following the <ref> element, | |
and move the page number inside the <ref> element. (I've simplified my data and the query to illustrate.) | |
Problem: The XQuery Update statement corrupts the sample.xml file. The resulting file has 0 bytes. When I | |
comment out the XQuery Update statement and uncomment the $test variable in the return expression, I get | |
expected results, so I think the logic is sound. Also, when I comment out line 25, the corruption doesn't | |
occur. But I need that line, which reconstructs the attributes. I'm stumped. | |
Test environment: Saxon-EE XQuery 9.6.0.7 with oXygen 17.1; with XQuery 3.0 and XQuery Update enabled. | |
:) | |
declare namespace tei="http://www.tei-c.org/ns/1.0"; | |
declare function local:reconstruct($nodes as node()*) { | |
for $node in $nodes | |
return | |
typeswitch ($node) | |
case element() return | |
element | |
{ node-name($node) } | |
{ | |
$node/@*, | |
local:reconstruct($node/node()) | |
} | |
default return $node | |
}; | |
let $doc := doc('02-sample.xml') | |
let $refs := $doc//tei:ref | |
[matches(following-sibling::node()[1][. instance of text()], '^, pp?\.\s+\d+')] | |
for $ref in $refs | |
let $following-text := $ref/following-sibling::text()[1] | |
let $analyze := analyze-string($following-text, '^(, pp?\.\s+)(\d+)(.*)$') | |
let $new-ref := | |
( | |
element | |
{ QName('http://www.tei-c.org/ns/1.0', 'ref') } | |
{ | |
local:reconstruct($ref/node()), | |
string-join($analyze/fn:match/fn:group[@nr = (1, 2)]) | |
} | |
) | |
let $new-following-text := string-join($analyze/fn:match/fn:group[@nr ge 3]) | |
let $test := | |
<result> | |
<original>{$ref, $following-text}</original> | |
<analysis>{$analyze}</analysis> | |
<new>{$new-ref, $new-following-text}</new> | |
</result> | |
return | |
(: | |
$test | |
:) | |
( | |
replace node $ref with $new-ref | |
, | |
replace node $following-text with $new-following-text | |
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0" encoding="UTF-8"?> | |
<note xmlns="http://www.tei-c.org/ns/1.0">For text of NSC 164/1, see <ref> | |
<hi rend="italic">Foreign Relations,</hi> 1952–1954, vol. VII, Part 2</ref>, p. 1914.</note> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<result> | |
<original> | |
<ref xmlns="http://www.tei-c.org/ns/1.0"> | |
<hi rend="italic">Foreign Relations,</hi> 1952–1954, vol. VII, Part 2</ref>, p. 1914.</original> | |
<analysis> | |
<fn:analyze-string-result xmlns:fn="http://www.w3.org/2005/xpath-functions"> | |
<fn:match> | |
<fn:group nr="1">, p. </fn:group> | |
<fn:group nr="2">1914</fn:group> | |
<fn:group nr="3">.</fn:group> | |
</fn:match> | |
</fn:analyze-string-result> | |
</analysis> | |
<new> | |
<ref xmlns="http://www.tei-c.org/ns/1.0"> | |
<hi rend="italic">Foreign Relations,</hi> 1952–1954, vol. VII, Part 2, p. 1914</ref>.</new> | |
</result> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment