Skip to content

Instantly share code, notes, and snippets.

@oisin

oisin/snarf-notes.rb

Created May 12, 2011
Embed
What would you like to do?
Grab notes from a Slideshare presentation and turn them into an HTML document
require 'rubygems'
require 'nokogiri'
require 'open-uri'
# Replace stupid 'smart' quotes in text, replace '\n' with real
# newlines, change selected diacritical marks
#
def cleaned(str)
str.gsub(/\\n/,"\n").gsub(/\‘|\’/, "'").gsub(/\”|\“/, '"').gsub(/í/, 'i')
end
url = 'http://www.slideshare.net/oisin/constructing-web-apis-with-rack-sinatra-and-mongodb'
doc = Nokogiri::HTML(open(url))
count = 1
puts "<html><head><title>Slide Notes for #{url}</title></head><body>"
doc.css("#notesList p").each do |p|
puts "<h2>Notes for slide #{count}</h2>"
puts "<p>#{ cleaned(p.content) }</p>"
count += 1
end
puts "</body></html>"
@oisin

This comment has been minimized.

Copy link
Owner Author

@oisin oisin commented May 12, 2011

The URL is hardcoded and no styling - please feel free to clone and extend. The so-called smart quotes and accented character in the cleaned() function will not appear correctly in some editors (e.g. vi). I used textmate for this and it works well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment