Skip to content

Instantly share code, notes, and snippets.

@JamoCA
Created April 26, 2016 18:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save JamoCA/ec4617b066fc4bb601f620bc93bacb57 to your computer and use it in GitHub Desktop.
Save JamoCA/ec4617b066fc4bb601f620bc93bacb57 to your computer and use it in GitHub Desktop.
ColdFusion CFC to transliterate text from one format to another. (Requires ICU4J)
<cfcomponent displayname="transliterator" hint="carries out transliteration based on ICU4J Transliterator class" output="no">
<!---
author: paul hastings <paul@sustainableGIS.com>
date: 10-feb-2004
revisions:
notes: this cfc contains methods to transliterate text from one format to another. IT DOES NOT
TRANSLATE TEXT. for example, a Russian to Latin transliterator changes Russian text written
in Cyrillic characters to phonetically equivalent Latin characters. IT DOES NOT TRANSLATE
TEXT. particularly useful transliterations are Any-Hex and Hex-Any which transliterates
to/from escaped ANSI unicode (\u3044) to unicode script similar to what happens when using
java resourceBundles. it requires that the ICU4J lib be installed. you can download ICU4J lib
from:
http://oss.software.ibm.com/icu4j/
extract the .JAR file and place it in cfusionMX/wwwroot.web-inf/lib
methods in this CFC:
- getAvailableIDs returns an array of available transliterator IDs. these are amed in the form
of To-From, such as Latin-Hebrew, which will transliterate from Latin script to Hebrew script.
- transliterate returns transliterated text string based on input transliterator ID and text.
required arguments are thisID, valid transliterator ID, and thisText, string to be transliterated.
the transliteration is "batch" where all the input string is transliterated. it will return an error
message if transliterator ID is invalid or the transliteration can't be accomplished, this is often
the case of the Any-"whatever" IDs, as not all transliteration can be accomplished directly,
Thai-Hebrew for example. in this case a possible workaround is to "daisy-chain" the transliteration,
Thai-Latin then Latin-Hebrew.
--->
<cfscript>
transliterator=createObject("java","com.ibm.icu.text.Transliterator");
</cfscript>
<cffunction name="getAvailableIDs" returntype="array" hint="returns array of available transliterator IDs"
output="No">
<cfscript>
var theseIDs=arrayNew(1);
var IDs=transliterator.getAvailableIDs();
while (IDs.hasMoreElements()) {
arrayAppend(theseIDs,IDs.nextElement().toString());
}
return theseIDs;
</cfscript>
</cffunction>
<cffunction name="transliterate" returntype="string" hint="transliterate input string using transliterator ID array of available transliterator IDs"
output="No">
<cfargument name="thisID" required="Yes" type="string">
<cfargument name="thisText" required="Yes" type="string">
<cfscript>
var transLit=transliterator.getInstance(arguments.thisID);
var transliteratedText ="";
try {
transliteratedText=transLit.transliterate(arguments.thisText);
}
catch (any ErrMsg) {
transliteratedText="ERROR: #ErrMsg#"; //not all IDs, usually Any-whatever, are actually supported
}
return transliteratedText;
</cfscript>
</cffunction>
</cfcomponent>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment