Created
April 26, 2016 18:05
-
-
Save JamoCA/ec4617b066fc4bb601f620bc93bacb57 to your computer and use it in GitHub Desktop.
ColdFusion CFC to transliterate text from one format to another. (Requires ICU4J)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<cfcomponent displayname="transliterator" hint="carries out transliteration based on ICU4J Transliterator class" output="no"> | |
<!--- | |
author: paul hastings <paul@sustainableGIS.com> | |
date: 10-feb-2004 | |
revisions: | |
notes: this cfc contains methods to transliterate text from one format to another. IT DOES NOT | |
TRANSLATE TEXT. for example, a Russian to Latin transliterator changes Russian text written | |
in Cyrillic characters to phonetically equivalent Latin characters. IT DOES NOT TRANSLATE | |
TEXT. particularly useful transliterations are Any-Hex and Hex-Any which transliterates | |
to/from escaped ANSI unicode (\u3044) to unicode script similar to what happens when using | |
java resourceBundles. it requires that the ICU4J lib be installed. you can download ICU4J lib | |
from: | |
http://oss.software.ibm.com/icu4j/ | |
extract the .JAR file and place it in cfusionMX/wwwroot.web-inf/lib | |
methods in this CFC: | |
- getAvailableIDs returns an array of available transliterator IDs. these are amed in the form | |
of To-From, such as Latin-Hebrew, which will transliterate from Latin script to Hebrew script. | |
- transliterate returns transliterated text string based on input transliterator ID and text. | |
required arguments are thisID, valid transliterator ID, and thisText, string to be transliterated. | |
the transliteration is "batch" where all the input string is transliterated. it will return an error | |
message if transliterator ID is invalid or the transliteration can't be accomplished, this is often | |
the case of the Any-"whatever" IDs, as not all transliteration can be accomplished directly, | |
Thai-Hebrew for example. in this case a possible workaround is to "daisy-chain" the transliteration, | |
Thai-Latin then Latin-Hebrew. | |
---> | |
<cfscript> | |
transliterator=createObject("java","com.ibm.icu.text.Transliterator"); | |
</cfscript> | |
<cffunction name="getAvailableIDs" returntype="array" hint="returns array of available transliterator IDs" | |
output="No"> | |
<cfscript> | |
var theseIDs=arrayNew(1); | |
var IDs=transliterator.getAvailableIDs(); | |
while (IDs.hasMoreElements()) { | |
arrayAppend(theseIDs,IDs.nextElement().toString()); | |
} | |
return theseIDs; | |
</cfscript> | |
</cffunction> | |
<cffunction name="transliterate" returntype="string" hint="transliterate input string using transliterator ID array of available transliterator IDs" | |
output="No"> | |
<cfargument name="thisID" required="Yes" type="string"> | |
<cfargument name="thisText" required="Yes" type="string"> | |
<cfscript> | |
var transLit=transliterator.getInstance(arguments.thisID); | |
var transliteratedText =""; | |
try { | |
transliteratedText=transLit.transliterate(arguments.thisText); | |
} | |
catch (any ErrMsg) { | |
transliteratedText="ERROR: #ErrMsg#"; //not all IDs, usually Any-whatever, are actually supported | |
} | |
return transliteratedText; | |
</cfscript> | |
</cffunction> | |
</cfcomponent> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment