Skip to content

Instantly share code, notes, and snippets.

@weakish
Created July 23, 2017 14:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save weakish/1590cadc4e5da57f27ffeff717af6b0b to your computer and use it in GitHub Desktop.
Save weakish/1590cadc4e5da57f27ffeff717af6b0b to your computer and use it in GitHub Desktop.
#download all files listed in an index page #ruby #nokogiri
#!/usr/bin/env ruby
# Given an index page listing files,
# download all files listed.
#
# License: 0BSD
require 'nokogiri'
require 'open-uri'
def download_all(url)
html = Nokogiri.parse(open(url).read)
links = html.css('a').map { |l| l['href'] }
rls = links.select { |l| l.match(/\.[a-z0-9]{3,4}$/) }
dls = rls.map { |l| url + l }
dls.each { |l| system('wget', '-c', l) }
end
if __FILE__ == $0
if ARGV.length == 0
puts 'Usage: download_index URL ..'
elsif ARGV.length == 1
download_all(ARGV[0])
else
ARGV.each do |url|
directory_name = url.match(/\/[^\/]+\/$/).to_s.gsub('/', '')
FileUtils.mkdir_p directory_name
FileUtils.cd directory_name
download_all(url)
FileUtils.cd '..'
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment