Skip to content

Instantly share code, notes, and snippets.

@tigerhawkvok
Last active March 29, 2017 03:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tigerhawkvok/7af532bc772825883254aeb8a374d189 to your computer and use it in GitHub Desktop.
Save tigerhawkvok/7af532bc772825883254aeb8a374d189 to your computer and use it in GitHub Desktop.
Basic Web URL validation
###
# This was created as part of an attempt to handle well-formatted but regular URL input from the client
#
# In my specific use case, it was just fetching license URLs so it frankly just needed to work for
# Creative Commons, GPL, MIT, and other common license URLS
#
# Longer, more robust patterns like
# https://gist.github.com/gruber/8891611
#
# Were proving fussy when integrating a pattern with <paper-input>
# https://www.webcomponents.org/element/PolymerElements/paper-input/paper-input
# in addition to not requiring it to be a fully qualified URL
#
# So I made a short one here to do much of the work. It's not really intended to be the be-all-and-end-all,
# but more a case of "A reasonable user shouldn't be able to enter something stupid out of laziness or negligence".
#
#
# Notes:
# - Does not validate the TLD
# - Works for IP-based connections (IPv4 and IPv6)
# - localhost works
# - Works with anchors
# - Works for file extensions
# - Works with query args
#
# Should Match:
http://localhost/~tigerhawkvok/admin-page.html
http://localhost/~tigerhawkvok/admin-page.mp4
https://192.168.0.1
https://192.168.0.1/foo/bar/baz.html
http://2001:0:9d38:953c:2c69:1dcd:b9d5:fe0
http://2001:0:9d38:953c:2c69:1dcd:b9d5:fe0/foo/bar/baz.html
https://google.com
https://stackoverflow.com/questions/3809401/what-is-a-good-regular-expression-to-match-a-url
http://regexlib.com/Search.aspx?k=url&AspxAutoDetectCookieSupport=1
https://www.webcomponents.org/element/PolymerElements/paper-input/paper-input
https://www.toggl.com/app/reports/weekly/12345678/period/thisWeek/billable/
http://handbook.arctosdb.org/documentation/collecting-event.html#verbatim_date
#
# Should fail
google.com
http://google
foobar
http://google/thisplace/foo.html
###
# Pre-escaped string
urlStringPattern = """((?:https?)://(?:(?:(?:[0-9]+\\.){3}[0-9]+|(?:[0-9a-f]+:){6,8}|(?:[\\w~\\-]{2,}\\.)+[\\w]{2,}|localhost))/?(?:[\\w~\\-]*/?)*(?:(?:\\.\\w+)?(?:\\?(?:\\w+=\\w+&?)*)?))(?:#[\\w~\\-]+)?"""
# Pattern
urlRegex = /((?:https?):\/\/(?:(?:(?:[0-9]+\.){3}[0-9]+|(?:[0-9a-f]+:){6,8}|(?:[\w~\-]{2,}\.)+[\w]{2,}|localhost))\/?(?:[\w~\-]*\/?)*(?:(?:\.\w+)?(?:\?(?:\w+=\w+&?)*)?))(?:#[\w~\-]+)?/im
if urlRegex.test providedUrl
# Good match
true
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment