Skip to content

Instantly share code, notes, and snippets.

@ericrasch
Last active October 6, 2019 12:37
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save ericrasch/6378893 to your computer and use it in GitHub Desktop.
Save ericrasch/6378893 to your computer and use it in GitHub Desktop.
RegEx Snippets

RegEx Snippets


Extract URLs

Find all links

Works pretty well in capturing the full URL when using this in a search (like in Sublime Text 2). (https?|ftps?)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/?

Find all links within a specific subfolder structure and replace with alt subfolders

The following will capture the URL in a SQL dump including escaped quotation marks.

  • URL pattern: http://www.yourwebsite.org/calculator/degrees/sociology
  • RegEx: (https?|ftps?)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/calculator/degrees/([^\\"]+)
  • Replacement: http://www.yourwebsite.org/degrees/$3/
  • Result: http://www.yourwebsite.org/degrees/sociology/

Find all links without trailing slashes

This will find all hrefs that do not contain a trailing slash. NOTE: it will detect links in your <head> and ones that end in .html that purposfully do not have a trailing slash, so becareful performing a find/replace.

  • URL pattern: href="http://www.yourwebsite.org/calculator/degrees/financial"
  • RegEx: href="(\S)+[^/]"

Find all id="*" attributes

This will find all id tags with either ' or ".

  • RegEx: id=("|')[^("|')]*("|')

Find all <a href=""></a> anchor tags

  • RegEx: <(?:\s?)[aA].*?href=[\'\"](?<link>.*?)[\'\"].*?>(?<text>.*)<(?:\s?)\/(?:\s?)[aA](?:\s?)>

Replaces Subdomain URLs

Convert from http://old.yourwebsite.org/whatever/ to http://new.yourwebsite.com/whatever/

EXPLAINED:

  • since the pattern will contain literal forward slashes for the url (eg "schema://domain/path"), we're delimiting the path with pipe chars to avoid having to backslash-escape each forward slash
  • just in case we have a mix of http & https urls, we'll match both with "https?" which means match "http" followed by one or zero "s" chars
  • we're backslash-escaping the dots in the domain, since in regex syntax an unescaped dot normally means "any single character other than a newline"
  • we're capturing everything before and after the "request" in "requestinfo" with parentheses in the pattern, then joining them together in the replacement using backreferences
  • we're making the entire pattern match as case insensitive by adding an "i" flag after the closing pattern delimiter

preg_replace('|(https?://)old(new\.yourwebsite\.)org|i', '$1$2com', $content);


Rewrite subfolder with keywords

#RewriteRule ^calculator/degrees(?:/([\w-]+?)(?:-in.+)?)?/?$ /degrees/$1/ [L,R=301]


Misc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment