ericrasch/RegEx Snippets.md

## RegEx Snippets.md

      
    Raw
  

              RegEx Snippets.md
            
          
    RegEx Snippets


Extract URLs

Find all links

Works pretty well in capturing the full URL when using this in a search (like in Sublime Text 2).
(https?|ftps?)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/?
Find all links within a specific subfolder structure and replace with alt subfolders

The following will capture the URL in a SQL dump including escaped quotation marks.

URL pattern: http://www.yourwebsite.org/calculator/degrees/sociology
RegEx: (https?|ftps?)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/calculator/degrees/([^\\"]+)
Replacement: http://www.yourwebsite.org/degrees/$3/
Result: http://www.yourwebsite.org/degrees/sociology/

Find all links without trailing slashes

This will find all hrefs that do not contain a trailing slash. NOTE: it will detect links in your <head> and ones that end in .html that purposfully do not have a trailing slash, so becareful performing a find/replace.

URL pattern: href="http://www.yourwebsite.org/calculator/degrees/financial"
RegEx: href="(\S)+[^/]"

Find all id="*" attributes

This will find all id tags with either ' or ".

RegEx: id=("|')[^("|')]*("|')

Find all <a href=""></a> anchor tags


RegEx: <(?:\s?)[aA].*?href=[\'\"](?<link>.*?)[\'\"].*?>(?<text>.*)<(?:\s?)\/(?:\s?)[aA](?:\s?)>


Replaces Subdomain URLs

Convert from http://old.yourwebsite.org/whatever/ to http://new.yourwebsite.com/whatever/
EXPLAINED:

since the pattern will contain literal forward slashes for the url (eg "schema://domain/path"), we're delimiting the path with pipe chars to avoid having to backslash-escape each forward slash
just in case we have a mix of http & https urls, we'll match both with "https?" which means match "http" followed by one or zero "s" chars
we're backslash-escaping the dots in the domain, since in regex syntax an unescaped dot normally means "any single character other than a newline"
we're capturing everything before and after the "request" in "requestinfo" with parentheses in the pattern, then joining them together in the replacement using backreferences
we're making the entire pattern match as case insensitive by adding an "i" flag after the closing pattern delimiter

preg_replace('|(https?://)old(new\.yourwebsite\.)org|i', '$1$2com', $content);

Rewrite subfolder with keywords

#RewriteRule ^calculator/degrees(?:/([\w-]+?)(?:-in.+)?)?/?$  /degrees/$1/ [L,R=301]

Misc.


Tweetbot can use regular expressions to mute tweets in your timeline and mentions.
Create complex RegExps more easily