Skip to content

Instantly share code, notes, and snippets.

@fallwith
Created May 12, 2011 06:28
Show Gist options
  • Save fallwith/968044 to your computer and use it in GitHub Desktop.
Save fallwith/968044 to your computer and use it in GitHub Desktop.
Via AppleScript, tell Safari to fetch the HTML for every page of your Netflix ratings and output the collected ratings to a CSV file.
#!/usr/bin/env ruby
require 'iconv'
# This is a simple script to spider your Netflix paginated "What You've Rated" list.
# It requires an OS X based system with Ruby 1.9+, Safari, and AppleScript
#
# I could not find a way to back up my ratings (for all titles, not just my rental activity)
# without registering for a Netflix API key or handing my Netflix credentials over to someone
# who had an API key, so I decided to take a brute force approach and just parse the HTML for
# every page of my ratings history on Netflix's site.
#
# INSTRUCTIONS:
# 1) Launch Safari, visit Netflix.com, log in if necessary, and visit your "What You've Rated"
# page. If the URL for page 1 differs from that of the STARTING_URL variable below, then
# update the variable's value. Leave the browser open on that page.
# 2) Set the OUTPUT_FILE variable below equal to the full path to the .csv file you'd like
# for this script to write the ratings to.
# 3) Set the PAGE_LOAD_GRACE variable equal to the number of seconds that you would like to
# give Safari to fully render each individual ratings history page before grabbing the
# HTML source for the page.
# 4) Execute this script ($> ruby <scriptname>) and be careful not to interfere with Safari
# while it visits each page in your ratings history.
# Config
STARTING_URL = 'http://movies.netflix.com/MoviesYouveSeen'
PAGE_LOAD_GRACE = 5 # seconds of grace to allow for Safari to finish rendering a single page of ratings
OUTPUT_FILE = '/tmp/netflix_ratings.csv'
# Character encoding converter instance used to force all HTML output into UTF-8 format
ICONV = Iconv.new('UTF-8//IGNORE', 'UTF-8')
# For the given page's worth of Netflix ratings, glean out the title, Netflix URL,
# genre, and rating for each entry.
def glean_movie_info(html, ratings_array=[])
next_url = nil
html.gsub!("\n",'')
html.scan(/<tr .*?class="agMovie".*?>(.*?)<\/tr>/).each do |movie_html|
movie_html = movie_html.first
if movie_html =~ /<td .*?class="cell-title".*?>.*?<a .*?href="(.*?)".*?>(.*?)<\/a>/
netflix_url, title = $1, $2
netflix_url = netflix_url.split('?').first
end
if movie_html =~ /<td .*?class="cell-starbar".*?>.*?You rated this movie: (\d+)/
rating = $1
else
# skip "not interested" titles
if title
puts "Couldn't find rating for title '#{title}' (probably marked as \"Not Interested\"), skipping..."
else
puts "Couldn't get title or rating for this block:\n\n#{movie_html}\n\nskipping..."
end
next
end
if movie_html =~ /<td .*?class="cell-genre".*?>.*?<span.*?>(.*?)<\/span/
genre = $1
end
ratings_array << {title: title, netflix_url: netflix_url, genre: genre, rating: rating}
end
if html =~ /<a .*?title="Go to the next page" .*?href="(.*?)"><span>next<\/span/
next_url = $1
end
next_url
end
# Obtain the HTML source for the given URL
def fetch_html(url)
applescript = <<-EOF
tell application "Safari"
activate
set url of document 1 to "#{url}"
delay #{PAGE_LOAD_GRACE}
set htmlSource to source of document 1
set the clipboard to htmlSource as text
end tell
EOF
ICONV.iconv(`osascript -e '#{applescript}' && pbpaste` + ' ')[0..-2]
end
# Starting with the first page of ratings, keep gleaning ratings info
# and moving on to the next page until the last page (which will not
# have a "next" link at the bottom). Keep adding each page's worth of
# info to the ratings array, which contains a hash of info for each movie.
url_to_fetch = STARTING_URL
ratings = []
until url_to_fetch == nil
url_to_fetch = glean_movie_info(fetch_html(url_to_fetch), ratings)
end
# Write the ratings to a .csv file
File.open(OUTPUT_FILE, 'w') do |f|
f.puts "title,netflix_url,genre,rating"
ratings.each do |rating|
f.puts "\"#{rating[:title]}\",#{rating[:netflix_url]},#{rating[:genre]},#{rating[:rating]}"
end
f.puts
end
@javamonkey79
Copy link

This of course no longer works, since netflix does not paginate but rather uses infinite scrolling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment