Skip to content

Instantly share code, notes, and snippets.

@corona6
Created July 7, 2015 05:53
Show Gist options
  • Save corona6/4c7dd496ff8e7487021e to your computer and use it in GitHub Desktop.
Save corona6/4c7dd496ff8e7487021e to your computer and use it in GitHub Desktop.
Web scraping from alpha.app.net trending
require 'capybara'
require 'capybara/poltergeist'
# create session
Capybara.javascript_driver = :poltergeist
options = { js_errors: false, timeout: 180, phantomjs_logger: StringIO.new, logger: nil, phantomjs_options: ['--load-images=no', '--ignore-ssl-errors=yes'] }
Capybara.register_driver(:poltergeist) do |app|
Capybara::Poltergeist::Driver.new app, options
end
session = Capybara::Session.new(:poltergeist)
# access to alpha.app.net trending
url = "https://alpha.app.net/browse/trending/"
session.visit(url)
# login(replace USERNAME and PASSWORD)
session.fill_in "id_username", with:"USERNAME"
session.fill_in "id_password", with:"PASSWORD"
session.find(".btn.btn-primary").click
# scroll down * 3
3.times {
sleep 2
session.driver.scroll_to(0, 10000)
}
# extract content
session.all(".subpixel.h-entry.post-container").each do |post|
puts post.find(".post-content.e-content").text
puts "https://alpha.app.net" + post.find(".u-url.timestamp")[:href]
puts
puts
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment