Skip to content

Instantly share code, notes, and snippets.

@jesperronn
Created November 5, 2009 11:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jesperronn/226994 to your computer and use it in GitHub Desktop.
Save jesperronn/226994 to your computer and use it in GitHub Desktop.
#!/usr/bin/env ruby
require 'rubygems'
require 'nokogiri'
require "open-uri"
url= 'http://justaddwater.dk/'
#search_text = "viagra"
search_text = "bubbleGUM"
doc = Nokogiri::HTML(open( url ))
matches = doc.css( 'a[href^="http://justaddwater.dk"]')
link_list = matches.map{ |link| link['href'] }
#alternative to yield faster load. use cached version here
#link_list = mock_link_list
puts "link list (#{link_list.size}): "
puts "removing all urls with #"
link_list.reject!{ |link| link.include?('#')}
puts "link list (#{link_list.size}): "
link_list.uniq!
name_max_length = link_list.max{ |a,b| a.length <=> b.length }.length
puts "link list (#{link_list.size}): "
#puts link_list
puts "testing each link in list for text '#{search_text}'"
link_list.each do |url|
#url = link_list[1]
doc = Nokogiri::HTML(open( url ))
print url<<": "
#doc.xpath does not find comments
#if doc.xpath("//text()").to_s.include? search_text
if doc.to_s.include? search_text
puts "WARNING contains #{search_text}"
else
print "...OK\n"
end
end
def mock_link_list
link_list = "http://justaddwater.dk/
http://justaddwater.dk/2009/11/03/zyb-com-sync-with-new-nokia-e52-phone/
http://justaddwater.dk/2009/11/03/zyb-com-sync-with-new-nokia-e52-phone/#comments
http://justaddwater.dk/2009/11/03/zyb-com-sync-with-new-nokia-e52-phone/
http://justaddwater.dk/2009/11/03/zyb-com-sync-with-new-nokia-e52-phone/
http://justaddwater.dk/2009/10/21/script-identifying-css-shorthand-possibilities/
http://justaddwater.dk/2009/10/21/script-identifying-css-shorthand-possibilities/#comments
http://justaddwater.dk/2009/10/21/script-identifying-css-shorthand-possibilities/
http://justaddwater.dk/2009/10/21/script-identifying-css-shorthand-possibilities/
http://justaddwater.dk/2009/10/15/resolve-symlinks-when-copying-files-with-rsync/
http://justaddwater.dk/2009/10/15/resolve-symlinks-when-copying-files-with-rsync/#comments
http://justaddwater.dk/2009/10/15/resolve-symlinks-when-copying-files-with-rsync/
http://justaddwater.dk/2009/10/15/resolve-symlinks-when-copying-files-with-rsync/
http://justaddwater.dk/2009/10/05/tiny-script-to-extract-html-structure-from-page/
http://justaddwater.dk/2009/10/05/tiny-script-to-extract-html-structure-from-page/#comments
http://justaddwater.dk/2009/10/05/tiny-script-to-extract-html-structure-from-page/
http://justaddwater.dk/2009/10/05/tiny-script-to-extract-html-structure-from-page/
http://justaddwater.dk/2009/10/02/useful-usability-tips/
http://justaddwater.dk/2009/10/02/useful-usability-tips/#comments
http://justaddwater.dk/2009/10/02/useful-usability-tips/
http://justaddwater.dk/2009/10/02/useful-usability-tips/
http://justaddwater.dk/2009/10/01/motivation-video-dont-give-up-if-they-let-you-down/
http://justaddwater.dk/2009/09/30/spoken-git-commit-log-%e2%80%94-another-annoyance-at-the-office/
http://justaddwater.dk/2009/09/29/print-css-background-logo-hack/
http://justaddwater.dk/2009/09/28/bash-or-ruby-for-removing-multiple-trac-spam-tickets/
http://justaddwater.dk/2009/09/23/internal-apps-in-firefox/
http://justaddwater.dk/2009/09/21/sell-more-add-next-steps-to-your-status-messages/
http://justaddwater.dk/2009/09/18/greasemonkey-debugging-tips-how-to-determine-why-your-script-is-not-running/
http://justaddwater.dk/2009/09/16/css-styling-buttons-problem-with-underlined-text/
http://justaddwater.dk/2009/09/02/2-must-know-html-table-colum-features-any-webdeveloper-should-be-aware-of/
http://justaddwater.dk/2009/09/01/selection-groups-in-rails-made-less-cumbersome/
http://justaddwater.dk/2009/08/28/the-quick-way-to-creating-disabled-state-icon-with-photoshop/
http://justaddwater.dk/2009/08/23/how-to-add-git-pull-shortcut-to-different-github-branches/
http://justaddwater.dk/2009/08/19/rup-vs-scrum-vs-kanban/
http://justaddwater.dk/2009/08/13/remove-smileys-in-messenger-program-ichat-for-mac/
http://justaddwater.dk/2009/07/08/importing-existing-git-repository-into-svn/
http://justaddwater.dk/2009/06/14/comments-working-again/
http://justaddwater.dk/2009/05/29/shorthand-to-clean-up-installed-rubygems-but-dont-cleanup-rails/
http://justaddwater.dk/2009/05/16/i-cant-get-an-adsense-account/
http://justaddwater.dk/2009/05/11/it-frustration-and-counter-productive-applications/
http://justaddwater.dk/2009/04/29/hitlers-final-agile-planning-meeting/
http://justaddwater.dk/2009/03/19/jef-raskins-first-law-of-interface-design-explained/
http://justaddwater.dk/2009/03/13/web-server-power-calculation/
http://justaddwater.dk/2009/03/10/textmate-path-modification-ruby-version-issues/
http://justaddwater.dk/2009/03/09/using-git-for-svn-repositories-workflow/
http://justaddwater.dk/2009/02/25/html-guide-textmate-snippets-open-sourced/
http://justaddwater.dk/2009/02/21/using-local-file-based-git-server-laziness/
http://justaddwater.dk/2009/02/01/interaction-design-experiment-delete-row/
http://justaddwater.dk/2009/01/28/human-time-format-gone-wrong/
http://justaddwater.dk/2009/01/17/git-side-benefit-reducing-disk-usage/
http://justaddwater.dk/2009/01/15/bad-usability-calendar-2009-is-out/
http://justaddwater.dk/2009/01/14/intranet-inspiration/
http://justaddwater.dk/2009/01/08/css-fun-flag-deprecated-html-tags/
http://justaddwater.dk/2008/12/17/time-to-revise-our-comment-policy/
http://justaddwater.dk/2008/11/30/busy-not-updating-blog/
http://justaddwater.dk/2008/11/07/plugin-for-railss-resource-controller/
http://justaddwater.dk/2008/11/05/ie-css-bug-background-image-gap-to-border/
http://justaddwater.dk/2008/10/09/net-tips-for-my-development-environment/
http://justaddwater.dk/2008/08/28/download-pages-ie-vs-firefox/
http://justaddwater.dk/2008/08/18/introducing-tiny-javascript-number-formatter/
http://justaddwater.dk/2008/08/03/could-you-fit-the-internet-in-windows-recycle-bin/
http://justaddwater.dk/2008/07/17/adding-deprecation-warning-in-javascript-console/
http://justaddwater.dk/2008/07/01/ie-css-bug-limited-include-statements/
http://justaddwater.dk/2008/06/26/ten-principles-of-google-user-experience/
http://justaddwater.dk/2008/06/23/dont-repeat-yourself-unless-reading-book/
http://justaddwater.dk/2008/06/19/not-ready-for-agile/
http://justaddwater.dk/contact-us/
http://justaddwater.dk/subscribe/
http://justaddwater.dk/blogtools/
http://justaddwater.dk/wordpress-plugins/
http://justaddwater.dk/passionate-users-printing/
http://justaddwater.dk/notes-elements-of-user-experience/
http://justaddwater.dk/wp-login.php?action=register
http://justaddwater.dk/wp-login.php"
link_list.to_a
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment