Skip to content

Instantly share code, notes, and snippets.

@JoshSchreuder
Created March 23, 2011 05:33
Show Gist options
  • Save JoshSchreuder/882675 to your computer and use it in GitHub Desktop.
Save JoshSchreuder/882675 to your computer and use it in GitHub Desktop.
Generates a CSV spreadsheet of the top rated TED talks according to PostRank (http://www.postrank.com/) analytics data. Note: you need a PostRank API key to use this.
# Copyright (c) 2011 Josh Schreuder
# http://www.postteenageliving.com
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'digest/md5'
require 'csv'
require 'json'
require 'curb'
require 'pp'
data = []
temp_fifty_urls = Hash.new
apikey = "somekeygoeshere"
# fetch ted talks from each page
doc = Nokogiri::HTML(open("http://www.ted.com/talks/quick-list"))
talks = doc.xpath('//table[@class="downloads notranslate"]/tr/td/a').inject({}) do |hash,a|
if !a.attributes['href'].value.include? '.mp4'
print "\n"
print "'" + a + "'"
print "\n"
print "'" + a.attributes['href'].value + "'"
print "\n"
print "'" + a.text + "'"
print "\n"
temp_fifty_urls["http://www.ted.com" + a.attributes['href'].value] = a.text
if temp_fifty_urls.keys.size == 50
puts "Got 50 URLs, getting PostRank data"
# fetch postrank metrics data
metrics = Curl::Easy.http_post('http://api.postrank.com/v2/entry/metrics?appkey='+apikey,
temp_fifty_urls.keys.map{|t| "url[]=#{Digest::MD5.hexdigest(t)}"}.join("&"))
metrics.perform
metrics = JSON.parse(metrics.body_str)
temp_fifty_urls.keys.each do |url|
data.push({'title' => temp_fifty_urls[url], 'url' => url}.merge(metrics[Digest::MD5.hexdigest(url)]))
end
print temp_fifty_urls
temp_fifty_urls.clear
end
end
end
# output a CSV file with the results
CSV.open("toptedtalks.csv", "w") do |csv|
columns = data.collect{|d| d.keys}.flatten.uniq.sort
columns.delete('title')
columns.delete('url')
csv << ["Title", "URL", *columns]
data.each do |a|
csv << [a['title'], a['url'], *columns.map{|c| a[c] || 0}]
end
end
# Updated Blog post: http://postteenageliving.com/2011/03/top-ted-talks/
# Original Blog post: http://blog.postrank.com/2010/05/and-the-most-engaging-ted-talk-is/
# Google spreadsheet data: https://spreadsheets0.google.com/ccc?key=tWri7T3f4Ex6-uVU8i9-FFQ&hl=en
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment