Skip to content

Instantly share code, notes, and snippets.

@amkurian
Last active January 30, 2019 08:17
Show Gist options
  • Save amkurian/bbd12bf33cdce6df7323786612826a99 to your computer and use it in GitHub Desktop.
Save amkurian/bbd12bf33cdce6df7323786612826a99 to your computer and use it in GitHub Desktop.
Web scrapping with Ruby and Headless Chrome, using Selenium Web Driver.

Make sure you have all the prerequisites installed.

Download and install the Google Chrome browser.

Download and install the chromedriver binary (simply brew install chromedriver if you use Homebrew).

For Ubuntu

sudo apt-get update
sudo apt-get install -y unzip xvfb libxi6 libgconf-2–4

Install chrome browser

sudo curl -sS -o - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add
sudo echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list
sudo apt-get -y update
sudo apt-get -y install google-chrome-stable

Install ChromeDriver

wget https://chromedriver.storage.googleapis.com/2.41/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
sudo mv chromedriver /usr/bin/chromedriver
sudo chown root:root /usr/bin/chromedriver
sudo chmod +x /usr/bin/chromedriver

Install the selenium-webdriver gem:

gem install selenium-webdriver

Getting started

require 'selenium-webdriver'
options = Selenium::WebDriver::Chrome::Options.new(args: ['headless'])
driver = Selenium::WebDriver.for(:chrome, options: options)
driver.get('http://stackoverflow.com/')
puts driver.title
driver.quit

Extracing Data

require 'selenium-webdriver'
options = Selenium::WebDriver::Chrome::Options.new(args: ['headless'])
driver = Selenium::WebDriver.for(:chrome, options: options)
driver.get('http://weblog.rubyonrails.org/')
element = driver.find_element(css: 'article header h2')
puts element.text.strip
driver.quit

Following Links

require 'selenium-webdriver'
options = Selenium::WebDriver::Chrome::Options.new(args: ['headless'])
driver = Selenium::WebDriver.for(:chrome, options: options)
driver.get('http://en.wikipedia.org/wiki/Main_Page')
driver.find_element(link_text: 'Random article').click
puts driver.current_url
driver.quit

Filling in a form

require 'selenium-webdriver'options = Selenium::WebDriver::Chrome::Options.new(args: ['headless'])driver = Selenium::WebDriver.for(:chrome, options: options)
driver.get('https://www.gov.uk/')
element = driver.find_element(name: 'q')
element.send_keys('passport')
element.submit
results = driver.find_element(id: 'results')
results.find_elements(tag_name: 'h3').each do |h3|
 puts h3.text.strip
end
driver.quit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment