Created
November 5, 2009 11:26
-
-
Save jesperronn/226994 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env ruby | |
require 'rubygems' | |
require 'nokogiri' | |
require "open-uri" | |
url= 'http://justaddwater.dk/' | |
#search_text = "viagra" | |
search_text = "bubbleGUM" | |
doc = Nokogiri::HTML(open( url )) | |
matches = doc.css( 'a[href^="http://justaddwater.dk"]') | |
link_list = matches.map{ |link| link['href'] } | |
#alternative to yield faster load. use cached version here | |
#link_list = mock_link_list | |
puts "link list (#{link_list.size}): " | |
puts "removing all urls with #" | |
link_list.reject!{ |link| link.include?('#')} | |
puts "link list (#{link_list.size}): " | |
link_list.uniq! | |
name_max_length = link_list.max{ |a,b| a.length <=> b.length }.length | |
puts "link list (#{link_list.size}): " | |
#puts link_list | |
puts "testing each link in list for text '#{search_text}'" | |
link_list.each do |url| | |
#url = link_list[1] | |
doc = Nokogiri::HTML(open( url )) | |
print url<<": " | |
#doc.xpath does not find comments | |
#if doc.xpath("//text()").to_s.include? search_text | |
if doc.to_s.include? search_text | |
puts "WARNING contains #{search_text}" | |
else | |
print "...OK\n" | |
end | |
end | |
def mock_link_list | |
link_list = "http://justaddwater.dk/ | |
http://justaddwater.dk/2009/11/03/zyb-com-sync-with-new-nokia-e52-phone/ | |
http://justaddwater.dk/2009/11/03/zyb-com-sync-with-new-nokia-e52-phone/#comments | |
http://justaddwater.dk/2009/11/03/zyb-com-sync-with-new-nokia-e52-phone/ | |
http://justaddwater.dk/2009/11/03/zyb-com-sync-with-new-nokia-e52-phone/ | |
http://justaddwater.dk/2009/10/21/script-identifying-css-shorthand-possibilities/ | |
http://justaddwater.dk/2009/10/21/script-identifying-css-shorthand-possibilities/#comments | |
http://justaddwater.dk/2009/10/21/script-identifying-css-shorthand-possibilities/ | |
http://justaddwater.dk/2009/10/21/script-identifying-css-shorthand-possibilities/ | |
http://justaddwater.dk/2009/10/15/resolve-symlinks-when-copying-files-with-rsync/ | |
http://justaddwater.dk/2009/10/15/resolve-symlinks-when-copying-files-with-rsync/#comments | |
http://justaddwater.dk/2009/10/15/resolve-symlinks-when-copying-files-with-rsync/ | |
http://justaddwater.dk/2009/10/15/resolve-symlinks-when-copying-files-with-rsync/ | |
http://justaddwater.dk/2009/10/05/tiny-script-to-extract-html-structure-from-page/ | |
http://justaddwater.dk/2009/10/05/tiny-script-to-extract-html-structure-from-page/#comments | |
http://justaddwater.dk/2009/10/05/tiny-script-to-extract-html-structure-from-page/ | |
http://justaddwater.dk/2009/10/05/tiny-script-to-extract-html-structure-from-page/ | |
http://justaddwater.dk/2009/10/02/useful-usability-tips/ | |
http://justaddwater.dk/2009/10/02/useful-usability-tips/#comments | |
http://justaddwater.dk/2009/10/02/useful-usability-tips/ | |
http://justaddwater.dk/2009/10/02/useful-usability-tips/ | |
http://justaddwater.dk/2009/10/01/motivation-video-dont-give-up-if-they-let-you-down/ | |
http://justaddwater.dk/2009/09/30/spoken-git-commit-log-%e2%80%94-another-annoyance-at-the-office/ | |
http://justaddwater.dk/2009/09/29/print-css-background-logo-hack/ | |
http://justaddwater.dk/2009/09/28/bash-or-ruby-for-removing-multiple-trac-spam-tickets/ | |
http://justaddwater.dk/2009/09/23/internal-apps-in-firefox/ | |
http://justaddwater.dk/2009/09/21/sell-more-add-next-steps-to-your-status-messages/ | |
http://justaddwater.dk/2009/09/18/greasemonkey-debugging-tips-how-to-determine-why-your-script-is-not-running/ | |
http://justaddwater.dk/2009/09/16/css-styling-buttons-problem-with-underlined-text/ | |
http://justaddwater.dk/2009/09/02/2-must-know-html-table-colum-features-any-webdeveloper-should-be-aware-of/ | |
http://justaddwater.dk/2009/09/01/selection-groups-in-rails-made-less-cumbersome/ | |
http://justaddwater.dk/2009/08/28/the-quick-way-to-creating-disabled-state-icon-with-photoshop/ | |
http://justaddwater.dk/2009/08/23/how-to-add-git-pull-shortcut-to-different-github-branches/ | |
http://justaddwater.dk/2009/08/19/rup-vs-scrum-vs-kanban/ | |
http://justaddwater.dk/2009/08/13/remove-smileys-in-messenger-program-ichat-for-mac/ | |
http://justaddwater.dk/2009/07/08/importing-existing-git-repository-into-svn/ | |
http://justaddwater.dk/2009/06/14/comments-working-again/ | |
http://justaddwater.dk/2009/05/29/shorthand-to-clean-up-installed-rubygems-but-dont-cleanup-rails/ | |
http://justaddwater.dk/2009/05/16/i-cant-get-an-adsense-account/ | |
http://justaddwater.dk/2009/05/11/it-frustration-and-counter-productive-applications/ | |
http://justaddwater.dk/2009/04/29/hitlers-final-agile-planning-meeting/ | |
http://justaddwater.dk/2009/03/19/jef-raskins-first-law-of-interface-design-explained/ | |
http://justaddwater.dk/2009/03/13/web-server-power-calculation/ | |
http://justaddwater.dk/2009/03/10/textmate-path-modification-ruby-version-issues/ | |
http://justaddwater.dk/2009/03/09/using-git-for-svn-repositories-workflow/ | |
http://justaddwater.dk/2009/02/25/html-guide-textmate-snippets-open-sourced/ | |
http://justaddwater.dk/2009/02/21/using-local-file-based-git-server-laziness/ | |
http://justaddwater.dk/2009/02/01/interaction-design-experiment-delete-row/ | |
http://justaddwater.dk/2009/01/28/human-time-format-gone-wrong/ | |
http://justaddwater.dk/2009/01/17/git-side-benefit-reducing-disk-usage/ | |
http://justaddwater.dk/2009/01/15/bad-usability-calendar-2009-is-out/ | |
http://justaddwater.dk/2009/01/14/intranet-inspiration/ | |
http://justaddwater.dk/2009/01/08/css-fun-flag-deprecated-html-tags/ | |
http://justaddwater.dk/2008/12/17/time-to-revise-our-comment-policy/ | |
http://justaddwater.dk/2008/11/30/busy-not-updating-blog/ | |
http://justaddwater.dk/2008/11/07/plugin-for-railss-resource-controller/ | |
http://justaddwater.dk/2008/11/05/ie-css-bug-background-image-gap-to-border/ | |
http://justaddwater.dk/2008/10/09/net-tips-for-my-development-environment/ | |
http://justaddwater.dk/2008/08/28/download-pages-ie-vs-firefox/ | |
http://justaddwater.dk/2008/08/18/introducing-tiny-javascript-number-formatter/ | |
http://justaddwater.dk/2008/08/03/could-you-fit-the-internet-in-windows-recycle-bin/ | |
http://justaddwater.dk/2008/07/17/adding-deprecation-warning-in-javascript-console/ | |
http://justaddwater.dk/2008/07/01/ie-css-bug-limited-include-statements/ | |
http://justaddwater.dk/2008/06/26/ten-principles-of-google-user-experience/ | |
http://justaddwater.dk/2008/06/23/dont-repeat-yourself-unless-reading-book/ | |
http://justaddwater.dk/2008/06/19/not-ready-for-agile/ | |
http://justaddwater.dk/contact-us/ | |
http://justaddwater.dk/subscribe/ | |
http://justaddwater.dk/blogtools/ | |
http://justaddwater.dk/wordpress-plugins/ | |
http://justaddwater.dk/passionate-users-printing/ | |
http://justaddwater.dk/notes-elements-of-user-experience/ | |
http://justaddwater.dk/wp-login.php?action=register | |
http://justaddwater.dk/wp-login.php" | |
link_list.to_a | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment