Skip to content

Instantly share code, notes, and snippets.

@JakeAustwick
Created December 29, 2011 23:14
Show Gist options
  • Save JakeAustwick/1536645 to your computer and use it in GitHub Desktop.
Save JakeAustwick/1536645 to your computer and use it in GitHub Desktop.
{"csrf-param"=>"authenticity_token", "csrf-token"=>"x9AmVjIszIelkzftpZCTLefQldZa+wVjpaE43i2yxNs="}
"LOLZLSOssoosos oskodkKOKOK"
require 'nokogiri'
require 'open-uri'
class MetaGrabber
attr_reader :doc, :meta
def initialize(url)
source='
<titLe>LOLZLSOssoosos oskodkKOKOK</tiTle>
<META CONTENT="authenticity_token" NAME="csrf-param" />
<meTa conteNt="x9AmVjIszIelkzftpZCTLefQldZa+wVjpaE43i2yxNs=" NAme="csrf-token" />'
@doc = Nokogiri::HTML::parse(source)
@meta = {}
end
def title
@title ||= @doc.xpath("//title").text rescue nil
end
#Some sites do <meta name="title" ... /> for some wierd reason
def meta_title
@meta['title'] ||= title
end
def grab_meta
# grab each meta tag
for i in @doc.xpath("//meta") do
next if !i[:name] #dont really care about these, http types etc
meta[i[:name].to_s.downcase] = i[:content]
end
end
def keywords_array
@meta['keywords'] ? @meta['keywords'].split(",").map{|kw| kw.strip} : nil
end
def common_words
#Use readability to get main content, strip out shit words like "and etc"
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment