Skip to content

Instantly share code, notes, and snippets.

@federomero
Created May 25, 2011 22:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save federomero/992142 to your computer and use it in GitHub Desktop.
Save federomero/992142 to your computer and use it in GitHub Desktop.
Simple script for testing http://xkcd.com/903/ alltext trivia
# encoding: UTF-8
require 'rubygems'
require 'nokogiri'
require 'open-uri'
start = 'C.A._Penarol'
target = 'Philosophy'
visited = []
def get_first_article(article)
base = 'http://en.wikipedia.org/wiki/'
doc = Nokogiri::HTML(open(base+article, 'User-Agent' => 'ruby'))
p = doc.css('#bodyContent > p').first
link = Nokogiri::HTML(formatParagraph(p.to_s)).css('a').find{ |l| l['href'].match(/^\//) }
puts link['href']
link ? link['href'].gsub(/^\/wiki\//, '') : nil
end
def formatParagraph(text)
# remove parenthesis
r = /\(.[^\(]*?\)/
mark = '@#$'
while text.match r
text.gsub!(r) {|m| m.to_s.gsub(/<|>/, '').gsub('(', mark).gsub(')', mark.reverse)}
end
text.gsub(mark, '(').gsub(mark.reverse, ')')
end
current = start
while(current && current != target && !visited.include?(current))
visited << current
current = get_first_article(current)
end
if !current
puts "We reached a dead end on #{visited.last}"
elsif current == target
puts "We arrived from #{start} to #{target} in #{visited.length} steps"
else
puts "We entered a loop starting on #{current}"
end
@cheerfulstoic
Copy link

Note: I had to add 'User-Agent' => 'ruby' to line 15 (see: http://stackoverflow.com/questions/2305975/openurihttperror-403-forbidden )

Also, had to require 'rubygems'

Thanks for the script. I thought about writing the same thing but didn't think about it again until just now ;) Works great (Barack Obama -> Philosophy takes 20 steps, BTW)

@federomero
Copy link
Author

Ok, I added those changes. We are probably using different ruby versions. I'm on 1.9.2, are you on 1.8.x?

@cheerfulstoic
Copy link

Yeah, indeed, thanks. I'm on 1.8.7 because I've got an old, very large project still there. We're planning on upgrading ;)

Also, check out my fork, I've written a script which takes a number of source pages and creates a graph using GraphViz. Here's a sample graph:

http://semi-sentient.com/img/philosophy.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment