Created
March 25, 2014 12:02
PatternMatcher.cfc - A ColdFusion Component Wrapper For The Java Regular Expression Engine
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!--- Create an input text in which we will find patterns. ---> | |
<cfsavecontent variable="input"> | |
Sarah 212-555-1234 01/12/1980 | |
Kim 212-555-9399 07/14/1975 | |
Jenna 917-555-8712 05/05/1977 | |
Tricia 646-555-9990 12/20/1974 | |
</cfsavecontent> | |
<!--- | |
Now, create a pattern to parse the above intput. We will be | |
looking for names, phone numbers, and birthdays. I'm going to | |
use a VERBOSE regular expression to explain the capture. | |
---> | |
<cfsavecontent variable="pattern">(?x) | |
## Group 1. | |
## Capture the name. | |
(\w+) | |
\s+ | |
## Group 2. | |
## Capture the phone number. | |
(\d+ - \d+ - \d+) | |
\s+ | |
## Group 3. | |
## Capture the date of birth (DOB). | |
(\d+ / \d+ / \d+) | |
</cfsavecontent> | |
<!--- | |
Now, create a matcher to parse the input string and make | |
use of it. | |
---> | |
<cfset matcher = createObject( "component", "PatternMatcher" ).init( | |
pattern, | |
input | |
) /> | |
<!--- Create an array to keep track of the records. ---> | |
<cfset records = [] /> | |
<!--- Keep looping over the input to find the matches. ---> | |
<cfloop condition="matcher.find()"> | |
<!--- | |
Create a record for this set of matches. When each pattern | |
is matched, we are given access to each captured group through | |
the group() method. | |
NOTE: Group Zero (0) is always the entire pattern match, even | |
if there is no capturing group around the entire pattern. | |
---> | |
<cfset record = { | |
name = matcher.group( 1 ), | |
phoneNumber = matcher.group( 2 ), | |
dateOfBirth = matcher.group( 3 ) | |
} /> | |
<!--- Add the record to the record set. ---> | |
<cfset arrayAppend( | |
records, | |
record | |
) /> | |
</cfloop> | |
<!--- Output the matches. ---> | |
<cfdump | |
var="#records#" | |
label="Parsed Record Data" | |
/> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!--- | |
Now, let's imagine that we want to go through the input and XXX | |
out people's date of births. First we'll want to reset our | |
matcher. | |
---> | |
<cfset matcher.reset() /> | |
<!--- Now, let's loop over the matches again. ---> | |
<cfloop condition="matcher.find()"> | |
<!--- | |
When replacing, we can use the captured group back references | |
for the values that we DO want to keep. Remember, the first | |
group was the name, the second the phone number, and the | |
third was the date of birth. | |
---> | |
<cfset matcher.replaceWith( | |
"$1 $2 MM/DD/YYYY" | |
) /> | |
</cfloop> | |
<!--- Output the result of the replacement. ---> | |
<cfoutput> | |
<pre> | |
#matcher.result()# | |
</pre> | |
</cfoutput> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!--- Output all the matches of the given pattern. ---> | |
<cfdump | |
var="#matcher.match()#" | |
label="Match()" | |
/> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!--- | |
Output all the matches of the given pattern, broken down | |
by captured group. | |
---> | |
<cfdump | |
var="#matcher.matchGroups()#" | |
label="MatchGroups()" | |
/> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<cfcomponent | |
output="false" | |
hint="I provide easier, implicitly type-cast access to the underlying Java Pattern and Matcher functionality."> | |
<cffunction | |
name="init" | |
access="public" | |
returntype="any" | |
output="false" | |
hint="I return an intialized component."> | |
<!--- Define arguments. ---> | |
<cfargument | |
name="pattern" | |
type="string" | |
required="true" | |
hint="I am the Java-compatible regular expression to be used to create this pattern matcher." | |
/> | |
<cfargument | |
name="input" | |
type="string" | |
required="true" | |
hint="I am the input text over which we will be matching the above regular expression pattern." | |
/> | |
<!--- Define the local scope. ---> | |
<cfset var local = {} /> | |
<!--- Store the original values. ---> | |
<cfset variables.pattern = arguments.pattern /> | |
<cfset variables.input = arguments.input /> | |
<!--- | |
Compile the regular expression pattern and get the | |
matcher for the given input sequence. | |
---> | |
<cfset variables.matcher = | |
createObject( "java", "java.util.regex.Pattern" ) | |
.compile( javaCast( "string", variables.pattern ) ) | |
.matcher( javaCast( "string", variables.input ) ) | |
/> | |
<!--- Create a buffer to store the replacement result. ---> | |
<cfset variables.buffer = createObject( "java", "java.lang.StringBuffer" ).init() /> | |
<!--- Return this object reference for method chaining. ---> | |
<cfreturn this /> | |
</cffunction> | |
<cffunction | |
name="find_" | |
access="public" | |
returntype="boolean" | |
output="false" | |
hint="I attempt to find the next pattern match located within the input string."> | |
<!--- Pass this request onto the matcher. ---> | |
<cfreturn variables.matcher.find() /> | |
</cffunction> | |
<cffunction | |
name="group" | |
access="public" | |
returntype="any" | |
output="false" | |
hint="I return the value captured by the given group. NOTE: Zero (0) will return the entire pattern match."> | |
<!--- Define arguments. ---> | |
<cfargument | |
name="index" | |
type="numeric" | |
required="false" | |
default="0" | |
/> | |
<cfargument | |
name="default" | |
type="string" | |
required="false" | |
hint="I am the optional default to use if the given group (index) was not captured. Non-captured group references will return VOID. A default can be used to return non-void values." | |
/> | |
<!--- Define the local scope. ---> | |
<cfset var local = {} /> | |
<!--- Get the given group value. ---> | |
<cfset local.capturedValue = variables.matcher.group( | |
javaCast( "int", arguments.index ) | |
) /> | |
<!--- | |
Check to see if the given group was able to capture a | |
value (of if it did not, in which case, it will return | |
NULL, destroying the variable). | |
---> | |
<cfif structKeyExists( local, "capturedValue" )> | |
<!--- Return the captured value. ---> | |
<cfreturn local.capturedValue /> | |
<cfelseif structKeyExists( arguments, "default" )> | |
<!--- | |
No group was captured, but a default value was | |
provided. Return the default value. | |
---> | |
<cfreturn arguments.default /> | |
<cfelse> | |
<!--- | |
No value was captured and no default was provided; | |
simply return VOID to the calling context. | |
---> | |
<cfreturn /> | |
</cfif> | |
</cffunction> | |
<cffunction | |
name="groupCount" | |
access="public" | |
returntype="numeric" | |
output="false" | |
hint="I return the number of capturing groups within the regular exression pattern."> | |
<!--- Pass this request onto the matcher. ---> | |
<cfreturn variables.matcher.groupCount() /> | |
</cffunction> | |
<cffunction | |
name="hasGroup" | |
access="public" | |
returntype="any" | |
output="false" | |
hint="I determine whether or not the given group was captured in the previous match."> | |
<!--- Define arguments. ---> | |
<cfargument | |
name="index" | |
type="numeric" | |
required="true" | |
/> | |
<!--- Define the local scope. ---> | |
<cfset var local = {} /> | |
<!--- Get the given group value. ---> | |
<cfset local.capturedValue = variables.matcher.group( | |
javaCast( "int", arguments.index ) | |
) /> | |
<!--- | |
Return whether or not the given captured group exists. | |
If it was captured, the value will exists; if it was not | |
captured, the given group value will be NULL (and hence | |
not exist). | |
---> | |
<cfreturn structKeyExists( local, "capturedValue" ) /> | |
</cffunction> | |
<cffunction | |
name="match" | |
access="public" | |
returntype="array" | |
output="false" | |
hint="I return the collection of all pattern matches found within the given input. NOTE: This resets the internal matcher."> | |
<!--- Define the local scope. ---> | |
<cfset var local = {} /> | |
<!--- Reset the pattern matcher. ---> | |
<cfset this.reset() /> | |
<!--- | |
Create an array in which to hold the aggregated | |
pattern matches. | |
---> | |
<cfset local.matches = [] /> | |
<!--- Keep looping, looking for matches. ---> | |
<cfloop condition="variables.matcher.find()"> | |
<!--- Gather the current match. ---> | |
<cfset arrayAppend( | |
local.matches, | |
variables.matcher.group() | |
) /> | |
</cfloop> | |
<!--- Return the collected matches. ---> | |
<cfreturn local.matches /> | |
</cffunction> | |
<cffunction | |
name="matchGroups" | |
access="public" | |
returntype="array" | |
output="false" | |
hint="I return the collection of all pattern matches found within the given input, broken down by group. NOTE: This resets the internal matcher."> | |
<!--- Define the local scope. ---> | |
<cfset var local = {} /> | |
<!--- Reset the pattern matcher. ---> | |
<cfset this.reset() /> | |
<!--- | |
Create an array in which to hold the aggregated | |
pattern matches. | |
---> | |
<cfset local.matches = [] /> | |
<!--- Keep looping, looking for matches. ---> | |
<cfloop condition="variables.matcher.find()"> | |
<!--- Create a match object. ---> | |
<cfset local.match = {} /> | |
<!--- | |
Move all of the captured groups into the match object | |
(with zero being the entire match). | |
---> | |
<cfloop | |
index="local.groupIndex" | |
from="0" | |
to="#variables.matcher.groupCount()#" | |
step="1"> | |
<!--- Get the local value. ---> | |
<cfset local.groupValue = variables.matcher.group( | |
javaCast( "int", local.groupIndex ) | |
) /> | |
<!--- | |
Check to see if the value exists and only set it | |
if it does; ColdFusion seems to not like having | |
a NULL set into the struct (although it really | |
shouldn't have a problem with it). | |
---> | |
<cfif structKeyExists( local, "groupvalue" )> | |
<!--- Store the captured value. ---> | |
<cfset local.match[ local.groupIndex ] = local.groupValue /> | |
</cfif> | |
</cfloop> | |
<!--- Add the current match object. ---> | |
<cfset arrayAppend( | |
local.matches, | |
local.match | |
) /> | |
</cfloop> | |
<!--- Return the collected matches. ---> | |
<cfreturn local.matches /> | |
</cffunction> | |
<cffunction | |
name="replaceAll" | |
access="public" | |
returntype="string" | |
output="false" | |
hint="I replace all the pattern matches of the original input with the given value."> | |
<!--- Define arguments. ---> | |
<cfargument | |
name="replacement" | |
type="string" | |
required="true" | |
hint="I am the string with which we are replacing the pattern matches." | |
/> | |
<cfargument | |
name="quoteReplacement" | |
type="boolean" | |
required="false" | |
default="false" | |
hint="I determine whether or not the replacement value should be quoted (this will escape any back reference values)." | |
/> | |
<!--- Check to see if we are quoting the replacement. ---> | |
<cfif arguments.quoteReplacement> | |
<!--- Quote the replacement string. ---> | |
<cfreturn variables.matcher.replaceAll( | |
variables.matcher.quoteReplacement( | |
javaCast( "string", arguments.replacement ) | |
) | |
) /> | |
<cfelse> | |
<!--- Use the replacement text as-is. ---> | |
<cfreturn variables.matcher.replaceAll( | |
javaCast( "string", arguments.replacement ) | |
) /> | |
</cfif> | |
</cffunction> | |
<cffunction | |
name="replaceFirst" | |
access="public" | |
returntype="string" | |
output="false" | |
hint="I replace the first pattern matche of the original input with the given value."> | |
<!--- Define arguments. ---> | |
<cfargument | |
name="replacement" | |
type="string" | |
required="true" | |
hint="I am the string with which we are replacing the first pattern match." | |
/> | |
<cfargument | |
name="quoteReplacement" | |
type="boolean" | |
required="false" | |
default="false" | |
hint="I determine whether or not the replacement value should be quoted (this will escape any back reference values)." | |
/> | |
<!--- Check to see if we are quoting the replacement. ---> | |
<cfif arguments.quoteReplacement> | |
<!--- Quote the replacement string. ---> | |
<cfreturn variables.matcher.replaceFirst( | |
variables.matcher.quoteReplacement( | |
javaCast( "string", arguments.replacement ) | |
) | |
) /> | |
<cfelse> | |
<!--- Use the replacement text as-is. ---> | |
<cfreturn variables.matcher.replaceFirst( | |
javaCast( "string", arguments.replacement ) | |
) /> | |
</cfif> | |
</cffunction> | |
<cffunction | |
name="replaceWith" | |
access="public" | |
returntype="any" | |
output="false" | |
hint="I replace the current match with the given value. NOTE: Back references within the replacement string will be honored unless the replacement value is quoted (see second arguemnt)."> | |
<!--- Define arguments. ---> | |
<cfargument | |
name="replacement" | |
type="string" | |
required="true" | |
hint="I am the value with which we are replacing the previous match." | |
/> | |
<cfargument | |
name="quoteReplacement" | |
type="boolean" | |
required="false" | |
default="false" | |
hint="I determine whether or not the replacement value should be quoted (this will escape any back reference values)." | |
/> | |
<!--- Check to see if we are quoting the replacement. ---> | |
<cfif arguments.quoteReplacement> | |
<!--- Quote the replacement value before you use it. ---> | |
<cfset variables.matcher.appendReplacement( | |
variables.buffer, | |
variables.matcher.quoteReplacement( | |
javaCast( "string", arguments.replacement ) | |
) | |
) /> | |
<cfelse> | |
<!--- Use raw replacement value. ---> | |
<cfset variables.matcher.appendReplacement( | |
variables.buffer, | |
javaCast( "string", arguments.replacement ) | |
) /> | |
</cfif> | |
<!--- Return this object reference for method chaining. ---> | |
<cfreturn this /> | |
</cffunction> | |
<cffunction | |
name="reset" | |
access="public" | |
returntype="any" | |
output="false" | |
hint="I reset the pattern matcher."> | |
<!--- Define arguments. ---> | |
<cfargument | |
name="input" | |
type="string" | |
required="false" | |
hint="I am the optional input with which to reset the pattern matcher." | |
/> | |
<!--- Check to see if a new input is being used. ---> | |
<cfif structKeyExists( arguments, "input" )> | |
<!--- Use a new input to reset the matcher. ---> | |
<cfset variables.matcher.reset( | |
javaCast( "string", arguments.input ) | |
) /> | |
<!--- Store the input property. ---> | |
<cfset variables.input = arguments.input /> | |
<cfelse> | |
<!--- Reset the internal matcher. ---> | |
<cfset variables.matcher.reset() /> | |
</cfif> | |
<!--- Reset the internal results buffer. ---> | |
<cfset variables.buffer = createObject( "java", "java.lang.StringBuffer" ).init() /> | |
<!--- Return this object reference for method chaining. ---> | |
<cfreturn this /> | |
</cffunction> | |
<cffunction | |
name="result" | |
access="public" | |
returntype="string" | |
output="false" | |
hint="I return the result of the replacement up until this point."> | |
<!--- | |
Since we are no longer dealing with replacements, | |
append the rest of the unmatched input string to the | |
results buffer. | |
---> | |
<cfset variables.matcher.appendTail( | |
variables.buffer | |
) /> | |
<!--- Return the resultand string. ---> | |
<cfreturn variables.buffer.toString() /> | |
</cffunction> | |
<!--- ------------------------------------------------- ---> | |
<!--- ------------------------------------------------- ---> | |
<!--- | |
Swap some of the method names; we couldn't name it "find" | |
to begin with otherwise we'd get a ColdFusion error for | |
conflicting with a native function name. | |
---> | |
<cfset this.find = this.find_ /> | |
</cfcomponent> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment