Skip to content

Instantly share code, notes, and snippets.

@oisin
Created May 12, 2011 08:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save oisin/968161 to your computer and use it in GitHub Desktop.
Save oisin/968161 to your computer and use it in GitHub Desktop.
Grab notes from a Slideshare presentation and turn them into an HTML document
require 'rubygems'
require 'nokogiri'
require 'open-uri'
# Replace stupid 'smart' quotes in text, replace '\n' with real
# newlines, change selected diacritical marks
#
def cleaned(str)
str.gsub(/\\n/,"\n").gsub(/\‘|\’/, "'").gsub(/\”|\“/, '"').gsub(/í/, 'i')
end
url = 'http://www.slideshare.net/oisin/constructing-web-apis-with-rack-sinatra-and-mongodb'
doc = Nokogiri::HTML(open(url))
count = 1
puts "<html><head><title>Slide Notes for #{url}</title></head><body>"
doc.css("#notesList p").each do |p|
puts "<h2>Notes for slide #{count}</h2>"
puts "<p>#{ cleaned(p.content) }</p>"
count += 1
end
puts "</body></html>"
@oisin
Copy link
Author

oisin commented May 12, 2011

The URL is hardcoded and no styling - please feel free to clone and extend. The so-called smart quotes and accented character in the cleaned() function will not appear correctly in some editors (e.g. vi). I used textmate for this and it works well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment