Skip to content

Instantly share code, notes, and snippets.

@joewiz
Last active October 22, 2022 14:42
Show Gist options
  • Save joewiz/af04074c28e0ae2a1b92 to your computer and use it in GitHub Desktop.
Save joewiz/af04074c28e0ae2a1b92 to your computer and use it in GitHub Desktop.
Strip diacritics, with XQuery
xquery version "3.1";
declare function local:strip-diacritics($string as xs:string) as xs:string {
$string
=> normalize-unicode("NFD")
=> replace("\p{IsCombiningDiacriticalMarks}", "")
};
declare function local:inspect-diacritics($string as xs:string) as element() {
let $normalized := normalize-unicode($string, "NFD")
let $stripped := local:strip-diacritics($string)
return
<result>
<source>{$string}</source>
<source-is-nfc-normalized>{$string = normalize-unicode($string)}</source-is-nfc-normalized>
<nfd-normalized>{$normalized}</nfd-normalized>
<stripped-of-combining-diacritical-marks>{$stripped}</stripped-of-combining-diacritical-marks>
<src-codepoints>{string-to-codepoints($string)}</src-codepoints>
<nfd-codepoints>{string-to-codepoints($normalized)}</nfd-codepoints>
<fin-codepoints>{string-to-codepoints($stripped)}</fin-codepoints>
</result>
};
let $source := 'çéüå'
return
local:inspect-diacritics($source)
<result>
<source>çéüå</source>
<source-is-nfc-normalized>true</source-is-nfc-normalized>
<nfd-normalized>çéüå</nfd-normalized>
<stripped-of-combining-diacritical-marks>ceua</stripped-of-combining-diacritical-marks>
<src-codepoints>231 233 252 229</src-codepoints>
<nfd-codepoints>99 807 101 769 117 776 97 778</nfd-codepoints>
<fin-codepoints>99 101 117 97</fin-codepoints>
</result>
@joewiz
Copy link
Author

joewiz commented Feb 15, 2016

Glad you found it useful, @Conal-Tuohy!

@ssire
Copy link

ssire commented Dec 18, 2019

a masterpiece, thanks for sharing !

@joewiz
Copy link
Author

joewiz commented Dec 20, 2019

@ssire So glad you found it useful!

@talithamotter
Copy link

Thanks for sharing!

@joewiz
Copy link
Author

joewiz commented Nov 13, 2021

@talithamotter My pleasure!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment