Skip to content

Instantly share code, notes, and snippets.

@mgomes
Created March 14, 2016 18:19
Show Gist options
  • Save mgomes/cdfcef43e8787f171bbf to your computer and use it in GitHub Desktop.
Save mgomes/cdfcef43e8787f171bbf to your computer and use it in GitHub Desktop.
Detect URL within text. Adapted from Android.
# URL Matching
# Taken from Android: http://grepcode.com/file/repository.grepcode.com/java/ext/com.google.android/android/5.1.1_r1/android/util/Patterns.java#145
GOOD_IRI_CHAR = /a-zA-Z0-9\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF/
GOOD_GTLD_CHAR = /a-zA-Z\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF/
GTLD = /[#{GOOD_GTLD_CHAR}]{2,63}/
IRI = /[#{GOOD_IRI_CHAR}]([#{GOOD_IRI_CHAR}\-]{0,61}[#{GOOD_IRI_CHAR}]){0,1}/
HOST_NAME = /(#{IRI}\.)+#{GTLD}/
IP_ADDRESS = /((25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[0-9]))/
DOMAIN_NAME = /(#{HOST_NAME}|#{IP_ADDRESS})/
WEB_URL_REGEX = /((?:(http|https|Http|Https|rtsp|Rtsp):\/\/(?:(?:[a-zA-Z0-9\$\-\_\.\+\!\*\'\(\)\,\;\?\&\=]|(?:\%[a-fA-F0-9]{2})){1,64}(?:\:(?:[a-zA-Z0-9\$\-\_\.\+\!\*\'\(\)\,\;\?\&\=]|(?:\%[a-fA-F0-9]{2})){1,25})?\@)?)?(?:#{DOMAIN_NAME})(?:\:\d{1,5})?)(\/(?:(?:[#{GOOD_IRI_CHAR}\;\/\?\:\@\&\=\#\~\-\.\+\!\*\'\(\)\,\_])|(?:\%[a-fA-F0-9]{2}))*)?(?:\b|$)/
@mgomes
Copy link
Author

mgomes commented Mar 14, 2016

Usage:

"Blah Blah. t.co/2efee73. :)".match(WEB_URL_REGEX)[0]
 => "t.co/2efee73"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment