Skip to content

Instantly share code, notes, and snippets.

@allejo
Last active August 29, 2015 14:06
Show Gist options
  • Save allejo/dad4052c799fc635732e to your computer and use it in GitHub Desktop.
Save allejo/dad4052c799fc635732e to your computer and use it in GitHub Desktop.
Read through an HTML file of anchor tags, parse all of the hyperlinks, and download them
#!/usr/bin/ruby
#
# License: Public Domain
line_number = 0;
dry_run = ARGV[1]
if ARGV[0].nil? || ARGV[0].empty?
puts "Usage: ruby urlFetcher.rb [FILE_PATH] [OPTION]"
puts ""
puts "The file given to this script must contain <a> tags with 'href' attributes"
puts "and those links are what will be parsed and downloaded."
puts ""
puts "Options"
puts " ---"
puts " --dry-run It will print out all of the files that will be downloaded"
puts " but won't actually download them"
puts ""
exit
end
text = File.open(ARGV[0]).read
text.gsub!(/\r\n?/, "\n")
text.each_line do |line|
current_link = line[/href=\".*?\"/].to_s
current_link.gsub! 'href=', ''
current_link.gsub! /\?.+/, ''
current_link.gsub! '"', ''
next if current_link.nil? || current_link.empty?
if dry_run == "--dry-run"
puts current_link
else
`wget #{current_link}`
end
line_number += 1
end
puts "#{line_number} files downloaded"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment