Skip to content

Instantly share code, notes, and snippets.

@xoyowade
Created May 2, 2012 14:47
Show Gist options
  • Save xoyowade/2577107 to your computer and use it in GitHub Desktop.
Save xoyowade/2577107 to your computer and use it in GitHub Desktop.
batch image downloader for http://forum.xitek.com/
#!/usr/bin/env ruby
# -*- coding: utf-8 -*-
# 色影无忌论坛图片批量下载
# 能自动在EXIF中添加图片描述(图片之前的文字)
# by Zhiwei Xiao
require 'rubygems'
require 'mechanize'
require 'to_regexp'
require 'mini_exiftool'
def clean_content(content)
Nokogiri::HTML(content).content.gsub(
/【提示:点击缩略图可看大图,登录用户可设置附件图显示方式。】/, ''
).squeeze(' ').strip
end
post_url =
'http://forum.xitek.com/forum-viewthread-tid-841590-page-1-ordertype-2-authorid-216472.html'
xitek_img_prefix = 'http://image.xitek.com/forum/'
img_xpath = 'td[@class="t_f"]'
img_regex =
/(.*?)<img src=".*?(?:xitek.com\/forum\/)?([\d\w\/]*\.jpg)".*? border=.*?>/m
count = 0
imgs = Hash.new
agent = Mechanize.new
page = agent.get(post_url)
while page do
count += 1
printf "Processing Page %d\n", count
page.search(img_xpath).each do | post |
post.inner_html.scan(img_regex) do | description, img |
img.sub!(/thumb_/, '')
description = clean_content(description)
# avoid duplication
if imgs[img]
# new description include the old one
if description.match(imgs[img].to_regexp(:literal => true))
imgs[img] = description
# old descprtion does not include the new one
elsif !imgs[img].match(description.to_regexp(:literal => true))
imgs[img] += "\n" + description
end
else
imgs[img] = description
end
end
end
if link = page.link_with(:text => '下一页')
page = link.click
else
page = nil
end
end
# create dir
dir = %x[mktemp -d ./imgs-XX].strip
# save images
count = 0
imgs.each do | link, description |
count += 1
printf "Saving Image %d of %d\n", count, imgs.count
agent.pluggable_parser.default = Mechanize::Download
img = agent.get(xitek_img_prefix + link)
filename = sprintf("%s/%0#{Math.log10(imgs.count).to_i+1}d.%s", dir, count,
img.filename.split('.')[-1])
img.save(filename)
imgexif = MiniExiftool.new filename
imgexif.comment = description.strip
imgexif.save
end
puts "All images are saved in " + dir
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment