Skip to content

Instantly share code, notes, and snippets.

@juliends
Created July 11, 2017 16:22
Show Gist options
  • Save juliends/18b0281ab8a65f70e8252893876d175f to your computer and use it in GitHub Desktop.
Save juliends/18b0281ab8a65f70e8252893876d175f to your computer and use it in GitHub Desktop.
imdb_scrapper
require 'open-uri' # Open an url
require 'nokogiri' # HTML ==> Nokogiri Document
url = "http://www.imdb.com/chart/top"
base_url = "http://www.imdb.com"
html_file = open(url)
html_doc = Nokogiri::HTML(html_file)
movies_doc = html_doc.search('.titleColumn a')
movies_doc.each do |element|
# sleep(1)
title = element.text
link = element.attribute('href')
actors = element.attribute('title')
# Sub url for each movie
sub_url = "#{base_url}#{link}"
# Open sub url to retrieve movie summary
html_file = open(sub_url)
html_doc = Nokogiri::HTML(html_file)
summary = html_doc.search('.summary_text').text.strip
# Creates movie text content with line return
movie_text = "#{title}\n"
movie_text += "#{actors}\n"
movie_text += "#{summary}\n"
# Creates a .txt file for each movie
file_path = "#{title.gsub(' ', '')}.txt"
File.open(file_path, 'w') do |file|
file.write(movie_text)
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment