Skip to content

Instantly share code, notes, and snippets.

@germanny
Forked from ericrasch/RegEx Snippets.md
Created November 13, 2013 21:41
Show Gist options
  • Save germanny/7456899 to your computer and use it in GitHub Desktop.
Save germanny/7456899 to your computer and use it in GitHub Desktop.

RegEx Snippets


Extract URLs

Find all links

Works pretty well in capturing the full URL when using this in a search (like in Sublime Text 2). (https?|ftps?)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/?

Find all links within a specific subfolder structure and replace with alt subfolders

The following will capture the URL in a SQL dump including escaped quotation marks.

  • URL pattern: http://www.yourwebsite.org/calculator/degrees/sociology
  • RegEx: (https?|ftps?)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/calculator/degrees/([^\\"]+)
  • Replacement: http://www.yourwebsite.org/degrees/$3/
  • Result: http://www.yourwebsite.org/degrees/sociology/

Find all links without trailing slashes

This will find all hrefs that do not contain a trailing slash. NOTE: it will detect links in your <head> and ones that end in .html that purposfully do not have a trailing slash, so becareful performing a find/replace.

  • URL pattern: href="http://www.yourwebsite.org/calculator/degrees/financial"
  • RegEx: href="(\S)+[^/]"

Replaces Subdomain URLs

Convert from http://old.yourwebsite.org/whatever/ to http://new.yourwebsite.com/whatever/

EXPLAINED:

  • since the pattern will contain literal forward slashes for the url (eg "schema://domain/path"), we're delimiting the path with pipe chars to avoid having to backslash-escape each forward slash
  • just in case we have a mix of http & https urls, we'll match both with "https?" which means match "http" followed by one or zero "s" chars
  • we're backslash-escaping the dots in the domain, since in regex syntax an unescaped dot normally means "any single character other than a newline"
  • we're capturing everything before and after the "request" in "requestinfo" with parentheses in the pattern, then joining them together in the replacement using backreferences
  • we're making the entire pattern match as case insensitive by adding an "i" flag after the closing pattern delimiter

preg_replace('|(https?://)old(new\.yourwebsite\.)org|i', '$1$2com', $content);


Rewrite subfolder with keywords

#RewriteRule ^calculator/degrees(?:/([\w-]+?)(?:-in.+)?)?/?$ /degrees/$1/ [L,R=301]


Misc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment