Skip to content

Instantly share code, notes, and snippets.

@igrabes
Created December 13, 2011 07:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save igrabes/1471001 to your computer and use it in GitHub Desktop.
Save igrabes/1471001 to your computer and use it in GitHub Desktop.
HTML Scraper
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'sqlite3'
require 'active_record'
require 'active_support'
ActiveRecord::Base.establish_connection(
:adapter => "sqlite3",
:database => "./development"
)
class Scrape < ActiveRecord::Base
#This is the Scrape class where we will instantiate new Scrape Objects
validates_uniqueness_of :title
end
url = "http://www.electronicaoasis.com"
doc = Nokogiri::HTML(open(url))
doc.css(".hentry").each do |entry|
title = entry.at_css("h2 a").text unless title
if entry.at_css(".size-full") != nil
image = entry.at_css(".size-full")[:src]
else
image = "there is no image"
end
post_date = entry.at_css(".postdate").text
url = entry.at_css("h2 a")[:href] unless url
# article_info = Nokogiri::HTML(open(url))
# para_info = (article_info/"p").text
@scrape = Scrape.new(
:title => title,
:url => url,
:image => image,
:post_date => post_date )
# :para_info => para_info)
@scrape.save
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment