Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
An XSL stylesheet testing for presence of characters from a certain Unicode block (in this case, Cyrillic) and reporting a message with filename of file containing such characters. Useful for cleaning up OCR, correcting homographs.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:tei=""
xmlns:xsl="" version="2.0"
<xsl:output method = "xml" indent="yes" omit-xml-declaration="no" />
<!-- 16croala-testforunicodeblocks: test text//text() nodes for characters from certain Unicode blocks -->
<xsl:template match="//*:text//text()">
<xsl:if test="matches(., '[\p{IsCyrillic}\p{IsCyrillicSupplement}\p{IsCyrillicExtended-A}\p{IsCyrillicExtended-B}]')">
<xsl:message>Characters from Cyrillic Unicode blocks in <xsl:value-of select="base-uri(.)"/></xsl:message>

This comment has been minimized.

Copy link
Owner Author

commented May 1, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.