Created
December 29, 2011 23:14
-
-
Save JakeAustwick/1536645 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{"csrf-param"=>"authenticity_token", "csrf-token"=>"x9AmVjIszIelkzftpZCTLefQldZa+wVjpaE43i2yxNs="} | |
"LOLZLSOssoosos oskodkKOKOK" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'nokogiri' | |
require 'open-uri' | |
class MetaGrabber | |
attr_reader :doc, :meta | |
def initialize(url) | |
source=' | |
<titLe>LOLZLSOssoosos oskodkKOKOK</tiTle> | |
<META CONTENT="authenticity_token" NAME="csrf-param" /> | |
<meTa conteNt="x9AmVjIszIelkzftpZCTLefQldZa+wVjpaE43i2yxNs=" NAme="csrf-token" />' | |
@doc = Nokogiri::HTML::parse(source) | |
@meta = {} | |
end | |
def title | |
@title ||= @doc.xpath("//title").text rescue nil | |
end | |
#Some sites do <meta name="title" ... /> for some wierd reason | |
def meta_title | |
@meta['title'] ||= title | |
end | |
def grab_meta | |
# grab each meta tag | |
for i in @doc.xpath("//meta") do | |
next if !i[:name] #dont really care about these, http types etc | |
meta[i[:name].to_s.downcase] = i[:content] | |
end | |
end | |
def keywords_array | |
@meta['keywords'] ? @meta['keywords'].split(",").map{|kw| kw.strip} : nil | |
end | |
def common_words | |
#Use readability to get main content, strip out shit words like "and etc" | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment