Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
#!/usr/bin/env ruby
# Script that wraps grabbing and selecting stuff from HTML page via CSS selectors
# Created 2009-10-01 by Jesper Rønn-Jensen,
# For usage, run parsepage.rb without arguments.
# Feel free to modify, fork and improve as long as you commit your changes back to me :)
def usage
<<-EOF #.gsub(' ', '')
=== USAGE ===
parsepage.rb [uri] [css_selector]
=== example ===
parsepage.rb "" "h3"
parsepage.rb $U "h1,h2,h3"
=== OPTIONS ===
--text-only lists only text from matching elements
--list-html lists all matches as HTML <li> elements
--count Count number of matches
print usage if ARGV.empty?
puts ARGV.inspect
require "rubygems"
require "nokogiri"
require "open-uri"
doc = Nokogiri::HTML(open(ARGV[0]))
matches = doc.css(ARGV[1])
result = []
result << "no matches found" if matches.empty?
if (ARGV.include?('--text-only'))
result <<{|element| "#{element.text}"}
elsif (ARGV.include?('--html-list'))
result <<{|element| "<li>#{element.text}<li>"}
result << matches.to_s
result << ''
result << "TOTAL (#{matches.size} match#{'es' if matches.size != 1} found for '#{ARGV[1]}')" if (ARGV.include?('--count'))
puts result.join("\n")
# TODO stuff that would be great to support
# * support for local files
# * refactor so this could be used to return a Nokogiri object with selection via irb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment