Skip to content

Instantly share code, notes, and snippets.

@penguin2716
Last active January 4, 2016 10:58
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save penguin2716/8611881 to your computer and use it in GitHub Desktop.
Save penguin2716/8611881 to your computer and use it in GitHub Desktop.
まとめサイトとかから自動で画像をwgetしてくれるRubyスクリプト
#!/usr/bin/env ruby
#-*- coding: utf-8 -*-
=begin
Auto wget image
Copyright (c) 2014 Takuma Nakajima
This software is released under the MIT License.
http://opensource.org/licenses/mit-license.php
=end
require 'open-uri'
require 'nokogiri'
if ARGV.empty?
puts "usage: #{$0} <url> ..."
exit
end
ARGV.each do |url|
html = Nokogiri::HTML(open(url).read)
# タイトルを抽出(あとでディレクトリを作る)
title = html.title
# 画像のリンクを抽出
images = html.css('a').select{ |a|
a.attributes["href"]
}.map{ |a|
a.attributes["href"].value
}.select{ |uri|
uri =~ /(jpg|jpeg|png)$/i
}
# 最もリンク元の多いホストのURLを探す
# まとめサイトとかだと大体同一のホストから画像を取って来ている
server = {}
images.each do |uri|
hostname = URI.parse(uri).host
if server[hostname]
server[hostname] += 1
else
server[hostname] = 1
end
end
mode = server.sort_by{|key, value| -value}.first.first
imglist = images.select{|uri| uri.index(mode)}
# タイトル名のディレクトリを作成して5並列wgetする
Dir.mkdir(title) unless Dir.exists?(title)
Dir.chdir(title)
system "echo #{imglist.join(' ')} | xargs -P 5 wget"
Dir.chdir("..")
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment