Skip to content

Instantly share code, notes, and snippets.

@federomero
Created May 25, 2011 22:29
Show Gist options
  • Save federomero/992142 to your computer and use it in GitHub Desktop.
Save federomero/992142 to your computer and use it in GitHub Desktop.
Simple script for testing http://xkcd.com/903/ alltext trivia
# encoding: UTF-8
require 'rubygems'
require 'nokogiri'
require 'open-uri'
start = 'C.A._Penarol'
target = 'Philosophy'
visited = []
def get_first_article(article)
base = 'http://en.wikipedia.org/wiki/'
doc = Nokogiri::HTML(open(base+article, 'User-Agent' => 'ruby'))
p = doc.css('#bodyContent > p').first
link = Nokogiri::HTML(formatParagraph(p.to_s)).css('a').find{ |l| l['href'].match(/^\//) }
puts link['href']
link ? link['href'].gsub(/^\/wiki\//, '') : nil
end
def formatParagraph(text)
# remove parenthesis
r = /\(.[^\(]*?\)/
mark = '@#$'
while text.match r
text.gsub!(r) {|m| m.to_s.gsub(/<|>/, '').gsub('(', mark).gsub(')', mark.reverse)}
end
text.gsub(mark, '(').gsub(mark.reverse, ')')
end
current = start
while(current && current != target && !visited.include?(current))
visited << current
current = get_first_article(current)
end
if !current
puts "We reached a dead end on #{visited.last}"
elsif current == target
puts "We arrived from #{start} to #{target} in #{visited.length} steps"
else
puts "We entered a loop starting on #{current}"
end
@cheerfulstoic
Copy link

Yeah, indeed, thanks. I'm on 1.8.7 because I've got an old, very large project still there. We're planning on upgrading ;)

Also, check out my fork, I've written a script which takes a number of source pages and creates a graph using GraphViz. Here's a sample graph:

http://semi-sentient.com/img/philosophy.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment