Skip to content

Instantly share code, notes, and snippets.

@johan
Forked from igrigorik/ted-postrank.rb
Created October 3, 2011 06:39
Show Gist options
  • Save johan/1258566 to your computer and use it in GitHub Desktop.
Save johan/1258566 to your computer and use it in GitHub Desktop.
Makes a tsv of the popularity of all TED talks to date using Postrank, if you have a Postrank API key
#! /usr/bin/env ruby
# good with ruby 1.8.7
# not good: the Postrank key used accepts no more traffic, and Google has since
# acquired Postrank, so there currently isn't any way of getting a new one. :-(
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'digest/md5'
require 'fastercsv'
require 'json'
require 'curb'
require 'pp'
data = []
page = 1
pages = nil # not known until we've read the first page
while !pages || page <= pages
# fetch ted talk url => title from each page
doc = Nokogiri.parse(open("http://www.ted.com/talks/list/page/#{page}").read)
pages = doc.xpath("string(//*[starts-with(.,'Showing page #{page } of ')])")
pages = pages.match(/\d+$/)[0].to_i
puts "processing page #{page} of #{pages}"
talks = doc.search('dd a').inject({}) do |hash,a|
url = "http://www.ted.com#{a.attributes['href'].value}"
hash[url] = a.attributes['title'].value
hash
end
# fetch postrank metrics data
metrics = Curl::Easy.http_post(
'http://api.postrank.com/v2/entry/metrics?appkey=TEDdemo',
talks.keys.map{|t| "url[]=#{Digest::MD5.hexdigest(t)}"}.join("&"))
metrics.perform
metrics = JSON.parse(metrics.body_str)
if metrics['error'] then
puts metrics['error']
exit 1
end
talks.keys.each do |url|
data.push({ 'title' => talks[url], 'url' => url}.
merge(metrics[Digest::MD5.hexdigest(url)]))
end
end
# output a CSV file with the results
FasterCSV.open('ted.csv', 'w') do |csv|
columns = data.collect{|d| d.keys}.flatten.uniq.sort
columns.delete('title')
columns.delete('url')
csv << ['Title', 'URL', *columns]
data.each do |a|
csv << [a['title'], a['url'], *columns.map{|c| a[c] || 0}]
end
end
# Blog post: http://blog.postrank.com/2010/05/and-the-most-engaging-ted-talk-is/
# Google spreadsheet data: https://spreadsheets0.google.com/ccc?key=tWri7T3f4Ex6-uVU8i9-FFQ&hl=en
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment