Skip to content

Instantly share code, notes, and snippets.

@ndpar
Created December 20, 2017 23:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ndpar/44a68c505eb480d0923aad67893f3198 to your computer and use it in GitHub Desktop.
Save ndpar/44a68c505eb480d0923aad67893f3198 to your computer and use it in GitHub Desktop.
URL Parser in Groovy
def subDomain = '(?i:[a-z0-9]|[a-z0-9][-a-z0-9]*[a-z0-9])' // simple regex in single quotes
def topDomains = $/
(?x-i : com \b # you can put whitespaces and comments
| edu \b # inside regex in eXtended mode
| biz \b
| in(?:t|fo) \b # backslash is not escaped
| mil \b # in dollar-slash strings
| net \b
| org \b
| [a-z][a-z] \b
)/$
def hostname = /(?:${subDomain}\.)+${topDomains}/ // variable substitution in slashy string
def NOT_IN = /;\"'<>()\[\]{}\s\x7F-\xFF/ // backslash is not escaped in slashy strings
def NOT_END = /!.,?/
def ANYWHERE = /[^${NOT_IN}${NOT_END}]/
def EMBEDDED = /[$NOT_END]/ // you can ommit {} around var name
def urlPath = "/$ANYWHERE*($EMBEDDED+$ANYWHERE+)*"
def url =
"""(?x:
# you have to escape backslash in multi-line double quotes
\\b
# match the hostname part
(
(?: ftp | http s? ): // [-\\w]+(\\.\\w[-\\w]*)+
|
$hostname
)
# allow optional port
(?: :\\d+ )?
# rest of url is optional, and begins with /
(?: $urlPath )?
)"""
assert 'http://www.google.com/search?rls=en&q=regex&ie=UTF-8&oe=UTF-8' ==~ url
assert 'pages.github.io' ==~ url
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment