Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
#!/usr/bin/env ruby
# Script that wraps grabbing and selecting stuff from HTML page via CSS selectors
# Created 2009-10-01 by Jesper Rønn-Jensen,
# For usage, run parsepage.rb without arguments.
# Feel free to modify, fork and improve as long as you commit your changes back to me :)
def usage
<<-EOF #.gsub(' ', '')
=== USAGE ===
parsepage.rb [uri] [css_selector]
=== example ===
parsepage.rb "" "h3"
parsepage.rb $U "h1,h2,h3"
=== OPTIONS ===
--text-only lists only text from matching elements
--list-html lists all matches as HTML <li> elements
--count Count number of matches
print usage if ARGV.empty?
puts ARGV.inspect
require "rubygems"
require "nokogiri"
require "open-uri"
doc = Nokogiri::HTML(open(ARGV[0]))
matches = doc.css(ARGV[1])
result = []
result << "no matches found" if matches.empty?
if (ARGV.include?('--text-only'))
result <<{|element| "#{element.text}"}
elsif (ARGV.include?('--html-list'))
result <<{|element| "<li>#{element.text}<li>"}
result << matches.to_s
result << ''
result << "TOTAL (#{matches.size} match#{'es' if matches.size != 1} found for '#{ARGV[1]}')" if (ARGV.include?('--count'))
puts result.join("\n")
# TODO stuff that would be great to support
# * support for local files
# * refactor so this could be used to return a Nokogiri object with selection via irb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.