Skip to content

Instantly share code, notes, and snippets.

@rizkyabdilah
Created December 7, 2011 10:23
Show Gist options
  • Save rizkyabdilah/1442302 to your computer and use it in GitHub Desktop.
Save rizkyabdilah/1442302 to your computer and use it in GitHub Desktop.
regex to capture and validate url, currently support capture scheme, username, password, host, port, path, query and fragment
import re
regex_url = re.compile("^(?:(?P<scheme>https?|ftps?):\/\/)?(?:(?:(?P<username>[\w\.\-\+%!$&'\(\)*\+,;=]+):*(?P<password>[\w\.\-\+%!$&'\(\)*\+,;=]+))@)?(?P<host>[a-z0-9-]+(?:\.[a-z0-9-]+)*(?:\.[a-z\.]{2,6})+)(?:\:(?P<port>[0-9]+))?(?P<path>\/(?:[\w_ \/\-\.~%!\$&\'\(\)\*\+,;=:@]+)?)?(?:\?(?P<query>[\w_ \-\.~%!\$&\'\(\)\*\+,;=:@\/]*))?(?:(?P<fragment>#[\w_ \-\.~%!\$&\'\(\)\*\+,;=:@\/]*))?$", re.IGNORECASE)
example_url = "http://username:password@example.net:130892/some/path/to/folder?query1=1&query2=2#go-to-fragment"
match_test = re.match(regex_url, example_url)
if match_test:
print match_test.groupdict()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment