Skip to content

Instantly share code, notes, and snippets.

@josteinaj
Last active February 24, 2017 10:26
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save josteinaj/7de65c722811e73a2ef7 to your computer and use it in GitHub Desktop.
Save josteinaj/7de65c722811e73a2ef7 to your computer and use it in GitHub Desktop.
An example of how I debugged XProc

Debugging XProc

Debugging XProc is mostly a process of elimination, sometimes with hints from the error messages. Here follows an example of debugging XProc with oXygen. The process is similar for other editors. The following is more of a story to show how debugging XProc can be done, and not so much a document to look up solutions to your particular problem.

This is the XProc script we are working on:

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step version="1.0"
        xmlns:p="http://www.w3.org/ns/xproc"
        xmlns:pef="http://www.daisy.org/ns/2008/pef"
        xmlns:px="http://www.daisy.org/ns/pipeline/xproc"
        xmlns:l="http://xproc.org/library"
        exclude-inline-prefixes="#all"
        type="pef:validate"
        name="main">
    
    <p:input port="source" primary="true"/>
    
    <p:output port="result" primary="true"/>
    
    <p:option name="assert-valid" required="false" select="'false'"/>
    
    <p:import href="http://www.daisy.org/pipeline/modules/validation-utils/library.xpl"/>
    
    <p:variable name="document-type" select="'PEF'"/>
    <p:variable name="base-uri" select="base-uri()"/>
    <p:variable name="document-name" select="tokenize($base-uri, '/')[last()]"/>

    <l:relax-ng-report name="validate-against-relaxng">
      <p:input port="schema">
        <p:document href="schema/pef-2008-1.rng"/>
      </p:input>
      <p:input port="source">
        <p:pipe step="main" port="source"/>
      </p:input>
    </l:relax-ng-report>

    <px:combine-validation-reports name="combined-error-report">
      <p:with-option name="document-type" select="$document-type"/>
      <p:with-option name="document-name" select="$document-name"/>
      <p:input port="source">
        <p:pipe port="report" step="validate-against-relaxng"/>
      </p:input>
    </px:combine-validation-reports>

    <px:validation-report-to-html name="html-report">
      <p:input port="source">
        <p:pipe port="result" step="combined-error-report"/>
      </p:input>
    </px:validation-report-to-html>
    
</p:declare-step>

The script has one required input port and no required options. We create a scenario in oXygen where we reference the following input document on the input port:

<?xml version="1.0" encoding="UTF-8"?>
<pef xmlns="http://www.daisy.org/ns/2008/pef" version="2008-1">
  <head xmlns:dc="http://purl.org/dc/elements/1.1/">
    <meta>
      <dc:date>2015-12-20</dc:date>
      <dc:format>application/x-pef+xml</dc:format>
      <dc:identifier>X</dc:identifier>
    </meta>
  </head>
  <body>
    <volume rows="10" cols="10" rowgap="0" duplex="true">
      <section>
        <page>
      <row>⠁⠁⠁⠁</row>
      <row/>
      <row/>
      <row/>
      <row/>
      <row/>
      <row/>
      <row/>
      <row/>
      <row/>
    </page>
      </section>
    </volume>
  </body>
</pef>

Our XProc script has one p:import, and it references a file that is not on our file system. We find (or make) a local copy of the project containing that file on our computer, and add that projects catalog.xml file to oXygen through Preferences -> XML -> XML Catalog.

Note that if the referenced project is part of a repository with other projects, then there will often be a "main catalog" in that repository importing all the sub-projects catalogs, so that you don't have to add all sub-project catalogs to oXygen one-by-one.

<p:import href="http://www.daisy.org/pipeline/modules/validation-utils/library.xpl"/>

We get this error in oXygen:

Scenario: validate.xpl
XProc file: /home/jostein/daisy-pipeline/pipeline-modules/pipeline-mod-braille/pipeline-braille-utils/pef-utils/pef-utils/src/main/resources/xml/validate.xpl
Engine name: Calabash XProc
Severity: error
Description: err:XC0053 : XC0053 It is a dynamic error if the assert-valid option is true and the input document is not valid.

So the error says that we're getting a XC0053 error. We look this up in the XProc specification and find that this is an error thrown by the steps p:validate-with-relax-ng, and p:validate-with-xml-schema. Reading up on the descriptions of those steps in the specification doesn't really bring us any further.

Since there's no line reference in the error message, we'll first need to locate where the error is thrown. We start commenting out bit by bit of the XProc script, re-running the script each time to see if the error disappears. Luckily this XProc script is a strictly linear script where the default output port of the first step automatically connects to the default input port of the next step, and so on, so we don't need to modify any port connections when commenting stuff out.

First we comment out px:validation-report-to-html:

    <!--<px:validation-report-to-html name="html-report">
      <p:input port="source">
        <p:pipe port="result" step="combined-error-report"/>
      </p:input>
    </px:validation-report-to-html>-->

We re-run the script, but we still get the same error. So we know that the error is not with px:validation-report-to-html. Next, we comment out px:combine-validation-reports:

    <!--<px:combine-validation-reports name="combined-error-report">
      <p:with-option name="document-type" select="$document-type"/>
      <p:with-option name="document-name" select="$document-name"/>
      <p:input port="source">
        <p:pipe port="report" step="validate-against-relaxng"/>
      </p:input>
    </px:combine-validation-reports>-->

We re-run the script, and voìla; no error! So there's something wrong with px:combine-validation-reports. But what? We need to dig deeper...

We look through our p:imports, and in this case there's only one so there's not many places to look:

<p:import href="http://www.daisy.org/pipeline/modules/validation-utils/library.xpl"/>

In oXygen, you can place your cursor on the URL and click CTRL + enter to open that file. The URL will be resolved to a file on our local file system because we have our catalogs set up properly.

The file we find turns out to be named validation-utils-library.xpl and is a XProc library file importing other XProc script. There's a reference to a file called combine-validation-reports.xpl, so we open that one:

<p:library version="1.0" (...)>
    
    (...)
    
    <p:import href="combine-validation-reports.xpl">
        <p:documentation>Utility step that combines many validation reports into one XML document.</p:documentation>
    </p:import>
    
</p:library>

In combine-validation-reports.xpl we find the declaration of px:combine-validation-reports, which is the step we're having trouble with:

<p:declare-step version="1.0" name="combine-validation-reports" type="px:combine-validation-reports"
    xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step"
    xmlns:px="http://www.daisy.org/ns/pipeline/xproc"
    xmlns:pxi="http://www.daisy.org/ns/pipeline/xproc/internal"
    xmlns:tmp="http://www.daisy.org/ns/pipeline/tmp" xmlns:d="http://www.daisy.org/ns/pipeline/data"
    xmlns:l="http://xproc.org/library" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:dtb="http://www.daisy.org/z3986/2005/dtbook/"
    xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:svrl="http://purl.oclc.org/dsdl/svrl"
    exclude-inline-prefixes="#all">

    <p:documentation xmlns="http://www.w3.org/1999/xhtml">
        <h1 px:role="name">Combine validation reports</h1>
        <p px:role="desc">Wrap one or more validation reports and optional document data. This
            prepares it for the validation-report-to-html step.</p>
    </p:documentation>

    <p:input port="source" primary="true" sequence="true">
        <p:documentation xmlns="http://www.w3.org/1999/xhtml">
            <h1 px:role="name">source</h1>
            <p px:role="desc">A validation report</p>
        </p:documentation>
    </p:input>
    <p:option name="document-name" required="false">
        <p:documentation xmlns="http://www.w3.org/1999/xhtml">
            <h1 px:role="name">document-name</h1>
            <p px:role="desc">The name of the document that was validated. Used for display
                purposes.</p>
        </p:documentation>
    </p:option>
    <p:option name="document-type" required="false">
        <p:documentation xmlns="http://www.w3.org/1999/xhtml">
            <h1 px:role="name">document-type</h1>
            <p px:role="desc">The type of the document. Used for display purposes.</p>
        </p:documentation>
    </p:option>
    <p:option name="document-path" required="false" select="''">
        <p:documentation xmlns="http://www.w3.org/1999/xhtml">
            <h1 px:role="name">document-path</h1>
            <p px:role="desc">The full path to the document, if available.</p>
        </p:documentation>
    </p:option>
    <p:option name="report-path" required="false" select="''">
        <p:documentation xmlns="http://www.w3.org/1999/xhtml">
            <h1 px:role="name">report-path</h1>
            <p px:role="desc">The path to the validation report XML, if available.</p>
        </p:documentation>
    </p:option>
    <p:option name="internal-info" required="false" select="''">
        <p:documentation xmlns="http://www.w3.org/1999/xhtml">
            <h1 px:role="name">internal-info</h1>
            <p px:role="desc">A string to stash in the document-info/@internal attribute.</p>
        </p:documentation>
    </p:option>

    <p:output port="result" primary="true"/>
    <p:import href="http://xmlcalabash.com/extension/steps/library-1.0.xpl"/>

    <!-- iterate through the documents on the source port -->
    <p:for-each>
        <p:variable name="root-element-name" select="*/name()"/>
        <p:wrap match="/" wrapper="report" wrapper-prefix="d"
            wrapper-namespace="http://www.daisy.org/ns/pipeline/data"/>

        <p:choose>
            <p:when test="$root-element-name = 'c:errors'">
                <p:add-attribute match="d:report">
                    <p:with-option name="attribute-name" select="'type'"/>
                    <p:with-option name="attribute-value" select="'relaxng'"/>
                </p:add-attribute>
            </p:when>
            <p:when test="$root-element-name = 'd:errors'">
                <p:add-attribute match="d:report">
                    <p:with-option name="attribute-name" select="'type'"/>
                    <p:with-option name="attribute-value" select="'filecheck'"/>
                </p:add-attribute>
            </p:when>
            <p:when test="$root-element-name = 'svrl:schematron-output'">
                <p:add-attribute match="d:report">
                    <p:with-option name="attribute-name" select="'type'"/>
                    <p:with-option name="attribute-value" select="'schematron'"/>
                </p:add-attribute>
            </p:when>
            <p:otherwise>
                <p:add-attribute match="d:report">
                    <p:with-option name="attribute-name" select="'type'"/>
                    <p:with-option name="attribute-value" select="'unknown'"/>
                </p:add-attribute>
            </p:otherwise>
        </p:choose>
    </p:for-each>

    <p:wrap-sequence name="combine-reports" wrapper="reports"
        wrapper-namespace="http://www.daisy.org/ns/pipeline/data" wrapper-prefix="d"/>

    <p:insert position="last-child">
        <p:input port="insertion">
            <p:pipe port="result" step="combine-reports"/>
        </p:input>
        <p:input port="source">
            <p:inline>
                <d:document-validation-report>
                    <d:document-info/>
                </d:document-validation-report>
            </p:inline>
        </p:input>
    </p:insert>

    <p:group name="add-document-metadata">
        <p:output port="result"/>
        <p:choose>
            <p:when test="string-length($document-path) > 0">
                <p:insert match="d:document-validation-report/d:document-info"
                    position="first-child">
                    <p:input port="insertion">
                        <p:inline>
                            <d:document-path>@@</d:document-path>
                        </p:inline>
                    </p:input>
                </p:insert>
                <p:string-replace match="//d:document-path/text()">
                    <p:with-option name="replace"
                        select="concat('&quot;', $document-path, '&quot;')"/>
                </p:string-replace>
            </p:when>
            <p:otherwise>
                <p:identity/>
            </p:otherwise>
        </p:choose>

        <p:choose>
            <p:when test="string-length($document-type) > 0">
                <p:insert match="d:document-validation-report/d:document-info"
                    position="first-child">
                    <p:input port="insertion">
                        <p:inline>
                            <d:document-type>@@</d:document-type>
                        </p:inline>
                    </p:input>
                </p:insert>
                <p:string-replace match="//d:document-type/text()">
                    <p:with-option name="replace"
                        select="concat('&quot;', $document-type, '&quot;')"/>
                </p:string-replace>
            </p:when>
            <p:otherwise>
                <p:identity/>
            </p:otherwise>
        </p:choose>

        <p:choose>
            <p:when test="string-length($document-name) > 0">
                <p:insert match="d:document-validation-report/d:document-info"
                    position="first-child">
                    <p:input port="insertion">
                        <p:inline>
                            <d:document-name>@@</d:document-name>
                        </p:inline>
                    </p:input>
                </p:insert>
                <p:string-replace match="//d:document-name/text()">
                    <p:with-option name="replace"
                        select="concat('&quot;', $document-name, '&quot;')"/>
                </p:string-replace>
            </p:when>
            <p:otherwise>
                <p:identity/>
            </p:otherwise>
        </p:choose>

        <p:choose>
            <p:when test="string-length($report-path) > 0">
                <p:insert match="d:document-validation-report/d:document-info" position="last-child">
                    <p:input port="insertion">
                        <p:inline>
                            <d:report-path>@@</d:report-path>
                        </p:inline>
                    </p:input>
                </p:insert>
                <p:string-replace match="//d:report-path/text()">
                    <p:with-option name="replace" select="concat('&quot;', $report-path, '&quot;')"
                    />
                </p:string-replace>
            </p:when>
            <p:otherwise>
                <p:identity/>
            </p:otherwise>
        </p:choose>

        <p:choose>
            <p:when test="string-length($internal-info) > 0">
                <p:add-attribute match="d:document-validation-report/d:document-info">
                    <p:with-option name="attribute-name" select="'internal'"/>
                    <p:with-option name="attribute-value" select="$internal-info"/>
                </p:add-attribute>
            </p:when>
            <p:otherwise>
                <p:identity/>
            </p:otherwise>
        </p:choose>
    </p:group>

    <p:choose>
        <p:when test="//c:errors">

            <!-- replace RelaxNG's c:error elements with our own d:error elements. This reduces the number of types of error descriptions we have to deal with. -->
            <p:group name="replace-cerror-with-derror">
                <!-- convert c:errors to d:errors -->
                <p:xslt name="cerror-to-derror-xsl">
                    <p:input port="stylesheet">
                        <p:document href="../xslt/cerrors-to-derrors.xsl"/>
                    </p:input>
                    <p:input port="parameters">
                        <p:empty/>
                    </p:input>
                    <p:input port="source" select="//c:errors"/>
                </p:xslt>

                <!-- replace c:errors with the results of the conversion -->
                <p:replace match="//c:errors">
                    <p:input port="replacement">
                        <p:pipe port="result" step="cerror-to-derror-xsl"/>
                    </p:input>
                    <p:input port="source">
                        <p:pipe port="result" step="add-document-metadata"/>
                    </p:input>
                </p:replace>
            </p:group>
        </p:when>
        <p:otherwise>
            <p:identity/>
        </p:otherwise>
    </p:choose>


    <p:group name="add-error-count">
        <p:variable name="error-count"
            select="count(//d:error) + count(//svrl:failed-assert) + count(//svrl:successful-report)"/>
        <p:insert match="d:document-validation-report/d:document-info" position="last-child">
            <p:input port="insertion">
                <p:inline>
                    <d:error-count>@@</d:error-count>
                </p:inline>
            </p:input>
        </p:insert>
        <p:string-replace match="//d:error-count/text()">
            <p:with-option name="replace" select="concat('&quot;', $error-count, '&quot;')"/>
        </p:string-replace>
    </p:group>

    <p:validate-with-relax-ng assert-valid="true">
        <p:input port="schema">
            <p:document href="../schema/document-validation-report.rng"/>
        </p:input>
    </p:validate-with-relax-ng>

</p:declare-step>

So, where to start. We can start commenting out parts to locate where the error occurs. However there's one step that already looks suspicious; the p:validate-with-relax-ng at the end. We know that the error comes from either an invocation of p:validate-with-relax-ng or p:validate-with-xml-schema, and this step is easy to comment out without breaking the XProc script, so we try that:

<p:declare-step version="1.0" name="combine-validation-reports" type="px:combine-validation-reports" (...)>
    
    (...)
    
    <!--<p:validate-with-relax-ng assert-valid="true">
        <p:input port="schema">
            <p:document href="../schema/document-validation-report.rng"/>
        </p:input>
    </p:validate-with-relax-ng>-->
    
</p:declare-step>

Now we re-run our script, making sure that px:combine-validation-reports is not commented out in validate.xpl, and see what happens. And it succeeds!

Right, so why does commenting out this last validation step help? Is that step needed? Is it a bug with px:combine-validation-reports or are we just using it wrong?

Well, let's see what the input to the p:validate-with-relax-ng that we commented out is to try and shed some light on this. We re-enable the p:validate-with-relax-ng and add a p:log to right before that step:

<p:declare-step version="1.0" name="combine-validation-reports" type="px:combine-validation-reports" (...)>
    
    (...)
    
    <p:identity>
        <p:log port="result" href="file:/tmp/out.xml"/>
    </p:identity>
    <p:validate-with-relax-ng assert-valid="true">
        <p:input port="schema">
            <p:document href="../schema/document-validation-report.rng"/>
        </p:input>
    </p:validate-with-relax-ng>
    
</p:declare-step>

Then we re-run the script, and open file:/tmp/out.xml, which looks like this:

<px:document-sequence xmlns:px='http://xmlcalabash.com/ns/document-sequence'
                      port='result'
                      xpl-file='file:/home/jostein/daisy-pipeline/pipeline-modules/pipeline-modules-common/validation-utils/src/main/resources/xml/xproc/combine-validation-reports.xpl'
                      xpl-line='251'
                      dateTime='2016-01-21T13:11:55+01:00'>
<px:document>
                <d:document-validation-report xmlns:d="http://www.daisy.org/ns/pipeline/data">
                    <d:document-info>
                            <d:document-name>pef_valid.pef</d:document-name>
                        
                            <d:document-type>PEF</d:document-type>
                        
                    <d:error-count>0</d:error-count>
                </d:document-info>
                <d:reports/></d:document-validation-report>
            </px:document>
</px:document-sequence>

There's some wrapper elements from Calabash's logging here, so we unwrap the px:document-sequence and px:document and end up with this document:

<d:document-validation-report xmlns:d="http://www.daisy.org/ns/pipeline/data">
    <d:document-info>
            <d:document-name>pef_valid.pef</d:document-name>
        
            <d:document-type>PEF</d:document-type>
        
    <d:error-count>0</d:error-count>
</d:document-info>
<d:reports/></d:document-validation-report>

So let's try to validate this document using the RNG referenced by p:validate-with-relax-ng. We need the full path to document-validation-report.rng, and one way to do this (there's many) is to select the href to it, press CTRL + enter to open it, then run the XPath base-uri() in oXygen, copy the output from that and paste it somewhere. However you get it, you'll get something like "file:/home/jostein/daisy-pipeline/pipeline-modules/pipeline-modules-common/validation-utils/src/main/resources/xml/schema/document-validation-report.rng" (remember to put the file: protocol at the beginning).

Now add a reference to this file as a xml-model processing instruction to the output we received earlier:

<?xml-model href="file:/home/jostein/daisy-pipeline/pipeline-modules/pipeline-modules-common/validation-utils/src/main/resources/xml/schema/document-validation-report.rng"?>
<d:document-validation-report xmlns:d="http://www.daisy.org/ns/pipeline/data">
    <d:document-info>
            <d:document-name>pef_valid.pef</d:document-name>
        
            <d:document-type>PEF</d:document-type>
        
    <d:error-count>0</d:error-count>
</d:document-info>
<d:reports/></d:document-validation-report>

You can alternatively create a validation scenario with the RNG file, but I personally find it easiest to just add the xml-model instruction.

Running validation in oXygen on this document gives this error:

System ID: /tmp/out.xml
Main validation file: /tmp/out.xml
Schema: /home/jostein/daisy-pipeline/pipeline-modules/pipeline-modules-common/validation-utils/src/main/resources/xml/schema/document-validation-report.rng
Engine name: Jing
Severity: error
Description: element "d:document-info" incomplete; missing required element "d:document-path"
Start location: 3:6
End location: 3:21

Ok! So there is actually a problem with this document. There seems to be missing a d:document-path element inside the d:document-info element. Where should that element have been inserted? We look back at combine-validation-reports.xpl and do a search for "document-path".

We find an option at the top:

<p:option name="document-path" required="false" select="''">
    <p:documentation xmlns="http://www.w3.org/1999/xhtml">
        <h1 px:role="name">document-path</h1>
        <p px:role="desc">The full path to the document, if available.</p>
    </p:documentation>
</p:option>

...and we find some logic in the middle of the document:

<p:choose>
    <p:when test="string-length($document-path) > 0">
        <p:insert match="d:document-validation-report/d:document-info"
            position="first-child">
            <p:input port="insertion">
                <p:inline>
                    <d:document-path>@@</d:document-path>
                </p:inline>
            </p:input>
        </p:insert>
        <p:string-replace match="//d:document-path/text()">
            <p:with-option name="replace"
                select="concat('&quot;', $document-path, '&quot;')"/>
        </p:string-replace>
    </p:when>
    <p:otherwise>
        <p:identity/>
    </p:otherwise>
</p:choose>

So, reading this logic, it seems that the d:document-path element is inserted only if the value of $document-path, which references the option, is not empty.

The default value of $document-path in combine-validation-reports.xpl is the empty string, and looking back at our original script, we do not set the document-path to anything else.

Well, that seems to be the problem then. Let's try by setting the document-path option in our validate.xpl script:

<px:combine-validation-reports name="combined-error-report">
  <p:with-option name="document-type" select="$document-type"/>
  <p:with-option name="document-name" select="$document-name"/>
  <p:with-option name="document-path" select="'test'"/>
  <p:input port="source">
    <p:pipe port="report" step="validate-against-relaxng"/>
  </p:input>
</px:combine-validation-reports>

We re-run the script, and... yes! It works! We solved our problem :).

I hope this was at the very least slightly interesting, and I wish you good luck on your future XProc debugging journeys!

Jostein Austvik Jacobsen - 2016-01-21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment