Skip to content

Instantly share code, notes, and snippets.

@pdaengeli
Created March 12, 2017 12:15
Show Gist options
  • Save pdaengeli/282a04dd09a17ef2ea18072a100a173b to your computer and use it in GitHub Desktop.
Save pdaengeli/282a04dd09a17ef2ea18072a100a173b to your computer and use it in GitHub Desktop.
Papyri-WL: fuzzy bibliographic entry matching, taking roman and arabic numerals and differing formatting into account
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:p="http://github.com/pdaengeli"
exclude-result-prefixes="xs"
version="2.0">
<!-- fuzzy bibliographic entry matching, taking roman and arabic numerals and differing formatting into account -->
<xsl:variable name="literature" select="doc('literature.xml')//*:bibl"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="ref/@target">
<xsl:variable name="compare">
<!--<xsl:analyze-string select="parent::ref/text()" regex="(\s)M{{1,4}}(CM|CD|D?C{{0,3}})(XC|XL|L?X{{0,3}})(IX|IV|V?I{{0,3}})|M{{0,4}}(CM|C?D|D?C{{1,3}})(XC|XL|L?X{{0,3}})(IX|IV|V?I{{0,3}})|M{{0,4}}(CM|CD|D?C{{0,3}})(XC|X?L|L?X{{1,3}})(IX|IV|V?I{{0,3}})|M{{0,4}}(CM|CD|D?C{{0,3}})(XC|XL|L?X{{0,3}})(IX|I?V|V?I{{1,3}})">-->
<xsl:analyze-string select="parent::ref/text()" regex="(\s)(M{{0,3}})(C[DM]|D?C{{0,3}})(X[LC]|L?X{{0,3}})(I[VX]|V?I{{0,3}})$">
<xsl:matching-substring>
<xsl:text> ¦</xsl:text>
<xsl:call-template name="convertRoman">
<xsl:with-param name="romanNr" select="normalize-space(.)" as="xs:string"/>
</xsl:call-template>
<xsl:text>¦</xsl:text>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:variable>
<xsl:choose>
<!-- case: value present -->
<xsl:when test="starts-with(.,'#b')">
<xsl:copy-of select="."/>
</xsl:when>
<!-- case: precise hit -->
<xsl:when test="parent::ref/text()=$literature/*:title[@type='short']">
<xsl:attribute name="target">
<xsl:value-of select="$literature[*:title[@type='short']=current()/parent::ref/text()]/@xml:id"/>
</xsl:attribute>
</xsl:when>
<!-- case: soft hit (roman numbers) -->
<xsl:when test="replace($compare,'¦','')=$literature/*:title[@type='short']">
<xsl:attribute name="target">
<xsl:value-of select="$literature[*:title[@type='short']=replace($compare,'¦','')]/@xml:id"/>
</xsl:attribute>
</xsl:when>
<!-- case: soft hit (roman numbers, padding one digit arabic numerics with leading 0) -->
<xsl:when test="string-length(substring-before(substring-after($compare,'¦'),'¦'))=1">
<xsl:attribute name="target">
<xsl:value-of select="$literature[*:title[@type='short']=replace(concat(substring-before($compare,'¦'),'0',substring-after($compare,'¦')),'¦','')]/@xml:id"/>
</xsl:attribute>
</xsl:when>
<!-- uncaught:
SPP III 2.5
A,. Pap. 26
SPP III 2.2
SPP III 2.1 -->
<xsl:otherwise/>
</xsl:choose>
</xsl:template>
<!-- resolve roman numbers -->
<xsl:template name="convertRoman">
<xsl:param name="romanNr" />
<xsl:for-each select="1 to 200">
<xsl:if test="p:toRoman(.) = $romanNr">
<xsl:value-of select="." />
</xsl:if>
</xsl:for-each>
</xsl:template>
<xsl:function name="p:toRoman" as="xs:string">
<xsl:param name="value" as="xs:integer"/>
<xsl:number value="$value" format="I"/>
</xsl:function>
</xsl:stylesheet>
@pdaengeli
Copy link
Author

Input-Beispiel:

[…]
<respStmt>
    <resp>Überlassung elektronischer Daten (für <ref target="#b21439">P.Oxy. LXXV</ref>)</resp>
    <persName xml:id="benaissa_a">Amin Benaissa</persName>
</respStmt>
[…]

Gegenstück in literature.xml:

[…]
<bibl xml:id="b21439" type="monograph">
    <title type="short">P.Oxy. 75</title>
    <title type="long">H. Maehler u.a., The Oxyrhynchus Papyri. Volume LXXV (Nos. 5020-5071). (Egypt Exploration Society. Graeco-Roman Memoirs. No. 96), London, Egypt Exploration Society, 2010</title>
    <editor ref="editors.xml#hagedorn_d">D. Hagedorn</editor>
    <date type="creation">vor 2016</date>
</bibl>
[…]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment