Skip to content

Instantly share code, notes, and snippets.

@ErikPeterson
Last active August 29, 2015 13:56
Show Gist options
  • Save ErikPeterson/9164541 to your computer and use it in GitHub Desktop.
Save ErikPeterson/9164541 to your computer and use it in GitHub Desktop.
Regex for URLs
#/^(\w+\:\/\/)?([\w+|\-\.\w+?\b]*)(:\d{1,5})?(\S*)?/
#When used with the .match method in Ruby on a URL string, this regex produces a Match object
#with the following array of matches: [whole url, protocol, domain, port, relative path]
#Examples (try these out in IRB or Pry):
url_reg = /^(\w+\:\/\/)?([\w+|\-\.\w+?\b]*)(:\d{4})?(\S*)?/
"http://www.google.com/index.html".match(url_reg).to_a.each_with_index do |el, index|
puts "#[#{index}] #{el}"
end
#[{0}] http://www.google.com/index.html
#[{1}] http://
#[{2}] www.google.com
#[{3}]
#[{4}] /index.html
"https://sx3jvhfgzhw44p3x.onion/".match(url_reg).to_a.each_with_index do |el, index|
puts "#[{index}] #{el}"
end
#[{0}] https://sx3jvhfgzhw44p3x.onion/
#[{1}] https://
#[{2}] sx3jvhfgzhw44p3x.onion
#[{3}]
#[{4}] /
#After splitting in this manner, each part can be further parsed to get subdomains, top level domains,
#individual directories, etc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment