Skip to content

Instantly share code, notes, and snippets.

@johnjansen
Last active July 13, 2017 18:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save johnjansen/514cb0b921459de8c37d48dad54114c9 to your computer and use it in GitHub Desktop.
Save johnjansen/514cb0b921459de8c37d48dad54114c9 to your computer and use it in GitHub Desktop.
require "crystagiri"
SEEN = Set(String).new
def parse(url)
return if SEEN.includes?(url)
SEEN.add url
begin
Crystagiri::HTML.from_url(url).css("a") do |anchor|
new_url = anchor
new_url = anchor.node.attributes["href"].content
next if new_url.starts_with?("#")
# absolutize needs work
if new_url.starts_with?("//")
new_url = url.split("//").first + new_url
elsif new_url.starts_with?("/")
new_url = File.dirname(url) + new_url
end
unless new_url.empty?
puts File.join(url, new_url)
parse(new_url)
end
end
rescue e
puts e
end
end
parse("https://www.apple.com/")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment