Skip to content

Instantly share code, notes, and snippets.

@willf
Last active June 6, 2017 06:13
Show Gist options
  • Save willf/e58d60f60d6b4f2f199c33ec766c5429 to your computer and use it in GitHub Desktop.
Save willf/e58d60f60d6b4f2f199c33ec766c5429 to your computer and use it in GitHub Desktop.
Follow redirect links
require 'net/https'
require 'uri'
=begin
This simple module provides two methods to follow, and return, the redirects of a URL
It goes to a depth of 10 unless specified otherwise.
Redirect.redirect_urls(<url>) returns a dictionary with the following keys:
:completed : true if reached final direct before hitting limit
:uris : a list of URI structures. The first will be the final destination
:urls: a string version of the above
:hosts: the unique hosts of the URIs
Redirect.resolve(<url>) returns the last resolved URL.
Redirect.redirect_urls("entish.org") => dictionary as above
Redirect.resolve("entish.org") => "http://entish.org"
By design, this doesn't do much error checking, though it does try
to add the Scheme ("http") and reuse URL information on relocations.
It also uses HEAD method, and sets a User-agent of "Ruby redirect script"
=end
module Redirect
def self.redirect_urls(url, options = {})
redirect_lookup_depth = options[:depth].to_i > 0 ? options[:depth].to_i : 10
current_uri = URI.parse(url)
current_uri = URI.parse('http://' + url) if current_uri.scheme.nil?
redirs = get_redirects(current_uri, [current_uri], redirect_lookup_depth, redirect_lookup_depth)
redirs[:urls] = redirs[:uris].map(&:to_s)
redirs[:hosts] = redirs[:uris].map(&:host).uniq
redirs
end
def self.resolve(url)
self.redirect_urls(url)[:urls][0]
end
private
def self.get_redirects(current_uri, uris, limit, limit_count)
return { completed: false, uris: uris } if limit_count < 1
http = Net::HTTP.new(current_uri.host, current_uri.port)
http.use_ssl = true if current_uri.scheme == 'https'
request = Net::HTTP::Head.new(current_uri.request_uri)
request.initialize_http_header('User-Agent' => 'Ruby redirect script')
response = http.request(request)
case response
when Net::HTTPSuccess then
return { completed: true, uris: uris }
when Net::HTTPRedirection then
redirect_location = response['location']
location_uri = URI.parse(redirect_location)
if location_uri.host.nil?
location_uri = URI.parse(uri.scheme + '://' + uri.host + redirect_location)
end
# puts("Redirecting from #{current_uri} to #{location_uri}")
get_redirects(location_uri, [location_uri] + uris, limit, limit - 1)
else
raise 'Non-success/redirect response: ' + response.inspect
end
end
end
@willf
Copy link
Author

willf commented Jun 5, 2017

>> Redirect.redirect_urls("https://t.co/TSoXOTDBYd")
=> {:completed=>true,
       :uris=>[#<URI::HTTP http://twistedsifter.com/2017/05/subway-maps-compared-to-their-actual-geography/>, #<URI::HTTP http://bit.ly/2qZ9oFw>, #<URI::HTTPS https://t.co/TSoXOTDBYd>], 
       :urls=>["http://twistedsifter.com/2017/05/subway-maps-compared-to-their-actual-geography/", "http://bit.ly/2qZ9oFw", "https://t.co/TSoXOTDBYd"], 
      :hosts=>["twistedsifter.com", "bit.ly", "t.co"]}

>> Redirect.resolve("https://t.co/TSoXOTDBYd")
=> "http://twistedsifter.com/2017/05/subway-maps-compared-to-their-actual-geography/"

@erebor
Copy link

erebor commented Jun 6, 2017

Oh, this is sooooo excellent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment