Created
March 24, 2014 23:40
Revisions
-
bennadel created this gist
Mar 24, 2014 .There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,513 @@ <!--- For this example, Let's assume that the URL was: http://www.bennadel.com/go/prettyurl/ This will be used in my comments below. ---> <!--- Check to see if this was due to a 404 error. We might be accessing this page via the Application.cfc onError event. ---> <cfif Find( "404;", CGI.query_string )> <!--- This is a 404 error. Now we have to go about figuring out just what the error was intending. Right now, the error string is in the form of: 404;http://www.bennadel.com:80/go/prettyurl/ ---> <!--- Get the incorrect URL from the query string (which IIS has thrown). This should start with "404;". It might also contain the port number (ex. :80) after the domain extension. We want to strip those out. We also want to strip out the "www." since it might not be there. ---> <cfset strTargetUrl = LCase( REReplace( CGI.query_string, "404;|:80|www\.", "", "ALL" ) ) /> <!--- ASSERT: strTargetUrl should now be in the form of: http://bennadel.com/go/prettyurl/ ---> <!--- Get the site url. We want to strip out any www from it. This way the site url *should* be part of the string we found above (where we also stripped out "www"). NOTE: My url is stored in a config object... but you can get that value from anywhere (or even hard code it right here). It is http://www.bennadel.com/ ---> <cfset strSiteUrl = LCase( Replace( APPLICATION.ServiceFactory.GetConfig().GetUrl(), "www.", "", "ALL" ) ) /> <!--- ASSERT: strSiteUrl should now be in the form of: http://bennadel.com/ ---> <!--- Now that we have the target url and the site url, we want to remove the site url from the target url so that we can isolate the script name that was being accessed. ---> <cfset strTargetUrl = Replace( strTargetUrl, strSiteUrl, "", "ONE" ) /> <!--- ASSERT: At this point, the strTargetUrl should hold the suffix url that was trying to be called. That is, the url of the page minus the site domain: go/prettyurl/ CAUTION: At this point, the page may contain query params (..?foo=bar). ---> <!--- Check to see if we have any query params. Since the 404 error passes the entire script name AND query string into the CGI query_string, we have to manually pull out the query string values ourself. ---> <cfif Find( "?", strTargetUrl )> <!--- We have query string values. Get the query params. ---> <cfset strTargetQueryParams = ListRest( strTargetUrl, "?" ) /> <!--- Now that we have the target query params, we can remove them from the target page. ---> <cfset strTargetUrl = ListGetAt( strTargetUrl, 1, "?" ) /> <cfelse> <!--- There are no query params. Set a blank value. ---> <cfset strTargetQueryParams = "" /> </cfif> <!--- Make sure all the slashes are web slashes. This should already be the case, but this is a safe-guard. ---> <cfset strTargetUrl = REReplace( strTargetUrl, "[\\/]+", "/", "ALL" ) /> <!--- Strip out trailing or leading slashed. ---> <cfset strTargetUrl = REReplace( strTargetUrl, "^[\\/]+|[\\/]+$", "", "ALL" ) /> <!--- We need to ge the target directory. Check to see if we are attempting to hit a file or a directory in the target url. ---> <cfif REFind( "\.[\w]+$", strTargetUrl )> <!--- The target item ends in a file ext. This must be a file. Get the base directory from the file name and remove the ending slash. ---> <cfset strTargetDirectory = REReplace( GetDirectoryFromPath( strTargetUrl ), "[\\/]+$", "", "ONE" ) /> <!--- Get the target script name to be the target url. This will have the directory AND file. ---> <cfset strTargetScriptName = strTargetUrl /> <cfelse> <!--- We are not attempting to access any file, just a directory. Grab that directory as the target directory. ---> <cfset strTargetDirectory = strTargetUrl /> <!--- Since we are pointing to a directory, just grab that as the script name as well. ---> <cfset strTargetScriptName = strTargetUrl /> </cfif> <!--- ASSERT: At this point, we have both : - target url - target directory - target query params that were attempted to get called. The target url does NOT have any leading or trailing slashes, but it might have a file name. ---> <!--- Not that we have all that stuff, we have to figure out what all that means to us on the LOCAL setup. IE, what the fake url map to in our framework. Let's test the tartet url against some regular expressios. ---> <cfsavecontent variable="strXmlRedirectExpressions"> <!--- In order to narrow down the regular expression that we have to run, I am checking the first item in the target url. ---> <cfswitch expression="#LCase( ListFirst( strTargetUrl, '/' ) )#"> <!--- FOR THIS DEMO i am putting the XML here. In reality, I am pulling in an xml file form each section so that each section can fine tune it's own redirection. ex: <cfinclude template="content/go/_url_redirect.xml.cfm" /> For the demo, I have included it in the proper case. ---> <cfcase value="go"> <redirect in="^go/ben-?nadel\b.*$" out="go.bennadel" /> <redirect in="^go/pretty-?url\b.*$" out="go.demo404" /> <!--- Notice in this one how I am using a reg-exp group reference. ---> <redirect in="^go/pretty-?url/([0-9]{4})/\b.*$" out="go.demo404&search_year=\1" /> </cfcase> </cfswitch> <!--- After the individual cases, I include a global 404 handler in case none of the others make it. ---> <redirect in=".+" out="home.display" /> </cfsavecontent> <!--- Trim the value on the XML. ---> <cfset strXmlRedirectExpressions = Trim( strXmlRedirectExpressions ) /> <!--- Check to see if there is a pretty url redirect expression list that we can use to test the target url. ---> <cfif Len( strXmlRedirectExpressions )> <!--- Parse the expressions into an xml document. ---> <cfset xmlRedirectExpressions = XmlParse( "<redirects>" & strXmlRedirectExpressions & "</redirects>" ) /> <!--- Get query string children. ---> <cfset xmlChildren = xmlRedirectExpressions.XmlRoot.XmlChildren /> <!--- Loop through expressions to see if any match. ---> <cfloop index="intChild" from="1" to="#ArrayLen( xmlChildren )#" step="1"> <!--- Get reference to this child's attributes. ---> <cfset objXmlAttributes = xmlChildren[ intChild ].XmlAttributes /> <!--- Check to see if we found a match. Use the regular expression in our redirects XML and test it against the target URL. ---> <cfif REFind( objXmlAttributes.In, strTargetUrl )> <!--- Get the mapped action (the OUT xml attribute). ---> <cfset strTargetAction = REReplace( strTargetUrl, objXmlAttributes.In, objXmlAttributes.Out, "ONE" ) /> <!--- Check to see if we have any query params as part of the target action string. ---> <cfif Find( "&", strTargetAction )> <!--- Add the query params to the target query params that we got from the original 404 error url. ---> <cfset strTargetQueryParams = ListAppend( strTargetQueryParams, ListRest( strTargetAction, "&" ), "&" ) /> <!--- Get rid of the query string part of the target action since we just copied it over to the target query params. ---> <cfset strTargetAction = ListFirst( strTargetAction, "&" ) /> </cfif> <!--- We found a regular expression match to the target URL. We don't need to keep searching so break out of the loop. ---> <cfbreak /> </cfif> </cfloop> <!--- ASSERT: At this point, we have the: - target url - the target query params - mapped action (based on the reg-exp) ---> <!--- Update script name based on the error. Since we cannot update the CGI.script_name value directly, I am storing the target "script name" in a custom variable. I keep a struct called Environment (CFC), but this could be any variable that you reference in the page processing. ---> <cfset REQUEST.Environment.OverrideScriptName( GetDirectoryFromPath( CGI.script_name ) & strTargetScriptName ) /> <!--- Remove the 404 error from the attributes. This is custom struct in my framework that combines the URL and FORM variables. ---> <cfloop item="strKey" collection="#REQUEST.Attributes#"> <cfif NOT Compare( "404;", Left( strKey, 4 ) )> <cfset StructDelete( REQUEST.Attributes, strKey ) /> </cfif> </cfloop> <!--- Now, we need to move any target query params into my framework's attributes scope. Since I never reference URL for FORM directly, I do NOT bother updating them at this point, but you could certainly set URL values here. ---> <!--- Update the attribute values. Get the array of params. ---> <cfset arrQueryParams = ListToArray( strTargetQueryParams, "&" ) /> <!--- Loop over the query param pairs and add them to the request attributes scope. ---> <cfloop index="intPair" from="1" to="#ArrayLen( arrQueryParams )#" step="1"> <!--- Get the pair. ---> <cfset arrPair = ListToArray( arrQueryParams[ intPair ], "=" ) /> <!--- Make sure we have two items. ---> <cfif (ArrayLen( arrPair ) NEQ 2)> <cfset arrPair[2] = "" /> </cfif> <!--- Set the attributes value. ---> <cfset REQUEST.Attributes[ arrPair[1] ] = arrPair[2] /> </cfloop> <!--- THIS NEXT IF STATEMENT IS PART OF MY FRAMEWORK. I DO NOT USE ABSOLUTE URLS IN MY APP. ALL MY URLS ARE RELATVE (IE. ../../../). BECAUSE OF THIS, I NEED TO UPDATE WHAT THE SERVER THINGS THE WEB BROWSER IS SEEING. SINCE THE SERVER IS IN THE ROOT AT THIS PAGE (site_error.cfm) AND THE WEB BROWSER IS IN A SUB DIRECTORY, THE TWO PATHS DO NOT LINE UP. HOWEVER, DUE TO THE WAY MY 404 HANDLER WORKS ON DEV, I HAVE TO DO THIS DIFFERENT ON THE DEV AND LIVE SERVERS. ---> <!--- Check to see why we are on the site_error.cfm page. If we are, then we were thrown directly to it (probably on the developmental server). In this case, use the appropriate web root (which would be ""). However, if we are not on that page, then we probably go sent here from another page (probably on the live server). ---> <cfif APPLICATION.ServiceFactory.GetConfig().GetIsLive()> <!--- We are live, get the webroot based on the query string. ---> <cfset REQUEST.Environment.Web.Root = RepeatString( "../", ListLen( strTargetDirectory, "/" ) ) /> </cfif> <!--- We do, so set the header to be proper code. ---> <cfheader statuscode="200" statustext="OK" /> <!--- Store the target action. ---> <cfset REQUEST.TargetAction = strTargetAction /> <!--- Include the index file. ---> <cfinclude template="index.cfm" /> <!--- We have just include the main site controller (index.cfm) We DO NOT WANT the rest of this template execute. ---> <cfexit /> <!--- There was no matching Regular Expression file for this html. Therefore, we are going to state that this page was reached in error. ---> <cfelse> <!--- If we are live. Send an email to alert error. ---> <cfif APPLICATION.ServiceFactory.GetConfig().GetIsLive()> <cfmail to="" from="" subject="Error Page Reached" type="HTML"> #CGI.script_name#<br /> #CGI.query_string#<br /> <br /> <cfdump var="#CGI#" /> <cfdump var="#REQUEST#" /> </cfmail> </cfif> </cfif> </cfif> <!--- ASSERT: This page was reached in error. No 404 error was mapped. Either someone has a bad link or they are trying to hack my site! ---> <!--- DISPLAY STANDARD HTML PAGE HERE. ---> <cfabort />