Skip to content

Instantly share code, notes, and snippets.

@mehdi-farsi
Last active August 29, 2015 14:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save mehdi-farsi/56749e52475e2b7a7a10 to your computer and use it in GitHub Desktop.
Save mehdi-farsi/56749e52475e2b7a7a10 to your computer and use it in GitHub Desktop.
Simple Web Scrapping Script Example Using Ruby And Nokogiri
# Example: List of all sports of the Olympic Games
# HTML Structure:
#
# <div id="content">
# <ul>
# <li>
# <a href="...">SPORT'S NAME</a>
# </li>
# </ul>
# </div>
require 'nokogiri'
require 'restclient'
require 'json'
sports = {}
BASE_URL = 'http://www.olympic.org'
sports_list_url = "#{BASE_URL}/sports"
page = Nokogiri::HTML(RestClient.get(sports_list_url))
links = page.css('#content ul li a').select # Select all sport contained in <li>
# To format output into JSON
links.each do |link|
sports[link.text.to_sym] = "#{BASE_URL}/#{link['href']}"
end
sports.to_json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment