Skip to content

Instantly share code, notes, and snippets.

@mendelgusmao
Created March 5, 2012 00:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mendelgusmao/1975692 to your computer and use it in GitHub Desktop.
Save mendelgusmao/1975692 to your computer and use it in GitHub Desktop.
require "nokogiri"
require "open-uri"
require "fileutils"
list = File.open("bmbslsk.txt").read.split("\n").sort
#list = [list.first]
list.each do |url|
output_file = "D:/Devs/bmbslsk/" + url[/\d+/]
next if File.exists? output_file + ".txt"
html = open(url) { |file| file.read }
doc = Nokogiri::HTML(html)
image = doc.css("meta[property='og:image']").first.attributes["content"]
title = doc.css("meta[property='og:title']").first.attributes["content"]
description = doc.css("meta[property='og:description']").first.attributes["content"]
open(image) do |file|
File.open(output_file + "_" + File.basename(image), "wb") do |output|
output.write(file.read)
end
end
File.open(output_file + ".txt", "w") do |output|
output.write image.to_s + "\n"
output.write "" + "\n"
output.write title.to_s + "\n"
output.write description.to_s.gsub("<BR>", "\n").split("|").first.gsub(/\n\n/, "\n") + "\n"
output.write ("*" * 80) + "\n"
end
sleep 5
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment