Skip to content

Instantly share code, notes, and snippets.

@tony612
Last active December 17, 2015 19:59
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tony612/5664752 to your computer and use it in GitHub Desktop.
Save tony612/5664752 to your computer and use it in GitHub Desktop.
A crawler used to grab all the scripts of big bang
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
page = agent.get('http://bigbangtrans.wordpress.com/')
page.links.select {|l| l.text =~ /^Series/}.each do |link|
puts link.text
sub_page = link.click
title = sub_page.search("h2.title").first.content
scripts = []
sub_page.search("div.entrytext > p").each do |i|
scripts << i.content
end
File.open("#{title.gsub(/\W+/, '_')}.txt", "w") do |f|
f.write(scripts * "\n\n")
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment