Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save pancodia/33661bf92e9d87c9640639aa44c8d3f8 to your computer and use it in GitHub Desktop.
Save pancodia/33661bf92e9d87c9640639aa44c8d3f8 to your computer and use it in GitHub Desktop.
Updated @gruber's regex with a modified version that looks for 2-13 letters rather than trying to look for specific TLDs. Given the recent addition of ~1400 gTLDs, it may be time to give up on that front. (UPDATE 2018-05-15: Naked URLs without protocol prefix now capable of matching more advanced URLs. Also escaped / and " so it's easier to copy…
# Single-line version:
# Commented multi-line version:
( # Capture 1: entire matched URL
https?: # URL protocol and colon
\/{1,3} # 1-3 slashes
| # or
[a-z0-9%] # Single letter or digit or '%'
# (Trying not to match e.g. "URI::Escape")
| # or
[a-z0-9.\-]+\. # looks like domain name
(?:[a-z0-9]{2,13}) # ending in common popular gTLDs (or final octet of IPv4 IP)
\/ # followed by a slash
(?: # One or more:
[^\s()<>{}\[\]]+ # Run of non-space, non-()<>{}[]
| # or
\([^\s()]*?\([^\s()]+\)[^\s()]*?\) # balanced parens, one level deep: (…(…)…)
\([^\s]+?\) # balanced parens, non-recursive: (…)
(?: # End with:
\([^\s()]*?\([^\s()]+\)[^\s()]*?\) # balanced parens, one level deep: (…(…)…)
\([^\s]+?\) # balanced parens, non-recursive: (…)
| # or
[^\s`!()\[\]{};:'\".,<>?«»“”‘’] # not a space or one of these punct chars
| # OR, the following to match naked domains:
(?<!@) # not preceded by a @, avoid matching foo@_gmail.com_(?<![@.])
\. # avoid matching the last two parts of an email domain
# like in
(?:[a-z0-9]{2,13}) # ending in common popular gTLDs (or final octet of IPv4 IP)
(?!@) # not succeeded by a @, avoid matching "" in ""
(?: # One or more:
[^\s()<>{}\[\]]+ # Run of non-space, non-()<>{}[]
| # or
\([^\s()]*?\([^\s()]+\)[^\s()]*?\) # balanced parens, one level deep: (…(…)…)
\([^\s]+?\) # balanced parens, non-recursive: (…)
(?: # End with:
\([^\s()]*?\([^\s()]+\)[^\s()]*?\) # balanced parens, one level deep: (…(…)…)
\([^\s]+?\) # balanced parens, non-recursive: (…)
| # or
[^\s`!()\[\]{};:'\".,<>?«»“”‘’] # not a space or one of these punct chars
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment