Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
#!/usr/bin/env ruby
# Script that wraps grabbing and selecting stuff from HTML page via CSS selectors
# Created 2009-10-01 by Jesper Rønn-Jensen, www.justaddwater.dk
#
# For usage, run parsepage.rb without arguments.
#
# Feel free to modify, fork and improve as long as you commit your changes back to me :)
def usage
<<-EOF #.gsub(' ', '')
=== USAGE ===
parsepage.rb [uri] [css_selector]
=== example ===
parsepage.rb "http://www.smashingmagazine.com/2009/09/24/10-useful-usability-findings-and-guidelines/" "h3"
U=http://www.smashingmagazine.com/2009/09/24/10-useful-usability-findings-and-guidelines/
parsepage.rb $U "h1,h2,h3"
=== OPTIONS ===
--text-only lists only text from matching elements
--list-html lists all matches as HTML <li> elements
--count Count number of matches
EOF
end
print usage if ARGV.empty?
puts ARGV.inspect
require "rubygems"
require "nokogiri"
require "open-uri"
doc = Nokogiri::HTML(open(ARGV[0]))
matches = doc.css(ARGV[1])
result = []
result << "no matches found" if matches.empty?
if (ARGV.include?('--text-only'))
result << matches.map{|element| "#{element.text}"}
elsif (ARGV.include?('--html-list'))
result << matches.map{|element| "<li>#{element.text}<li>"}
else
result << matches.to_s
end
result << ''
result << "TOTAL (#{matches.size} match#{'es' if matches.size != 1} found for '#{ARGV[1]}')" if (ARGV.include?('--count'))
puts result.join("\n")
# TODO stuff that would be great to support
#
# * support for local files
# * refactor so this could be used to return a Nokogiri object with selection via irb
#
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.