Skip to content

Instantly share code, notes, and snippets.

@philtr
Last active April 10, 2020 12:51
Show Gist options
  • Save philtr/999a0429bba2c26169e40d2808f5208c to your computer and use it in GitHub Desktop.
Save philtr/999a0429bba2c26169e40d2808f5208c to your computer and use it in GitHub Desktop.
Scrape a Google doc and display on your site.
.google-doc {
.title { font-size: 2rem; font-weight: 900 }
ul { margin: 0.5rem 0; }
ul[class*="-1"] { margin-left: 2rem; }
ul[class*="-2"] { margin-left: 4rem; }
}
# publish the Google doc as a website and pass the URI in here
def google_doc_content(uri)
response = Net::HTTP.get(URI(uri))
document = Nokogiri::HTML(response)
document.encoding = "UTF-8"
# Remove Google proxy from links
document.css("a").each do |link|
href = link.attributes["href"].value
if href =~ /google.com/
href = href.gsub(%r{https://www.google.com/url\?q=}, "")
href = href.gsub(%r{&sa=.+&ust=\d+}, "")
href = CGI.unescape(href)
end
link.attributes["href"].value = href
end
document.xpath(".//style").remove
document_html = document.css("#contents").to_html
.encode("UTF-8", invalid: :replace, undef: :replace)
%{<div class="google-doc">#{document_html}</div>}
end
<html>
<body>
<div id="google-doc">
<%# The URL here should be the one for the published document %>
<%= google_doc_html('https://docs.google.com/.../.../') %>
</div>
</body>
</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment