Skip to content

Instantly share code, notes, and snippets.

@cheerfulstoic
Forked from federomero/wiki.rb
Created July 7, 2011 12:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cheerfulstoic/1069450 to your computer and use it in GitHub Desktop.
Save cheerfulstoic/1069450 to your computer and use it in GitHub Desktop.
Simple script for testing http://xkcd.com/903/ alltext trivia
# encoding: UTF-8
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'graphviz'
target = 'Philosophy'
links = {}
def get_first_article(article)
base = 'http://en.wikipedia.org/wiki/'
doc = Nokogiri::HTML(open(base+article, 'User-Agent' => 'ruby'))
doc.css('#bodyContent > p').each do |p|
link = Nokogiri::HTML(formatParagraph(p.to_s)).css('a').find{ |l| l['href'].match(/^\//) }
return link['href'].gsub(/^\/wiki\//, '') if link
end
end
def formatParagraph(text)
# remove parenthesis
r = /\(.[^\(]*?\)/
mark = '@#$'
while text.match r
text.gsub!(r) {|m| m.to_s.gsub(/<|>/, '').gsub('(', mark).gsub(')', mark.reverse)}
end
text.gsub(mark, '(').gsub(mark.reverse, ')')
end
def map_links(start, target, links)
puts "\nStarting new mapping"
current = start
while(current && current != target && !links[current])
puts current
last = current
current = links[current] = get_first_article(current).to_s
end
links
end
pages = [
'Kudremukh_Iron_Ore_Company_Ltd.',
'Pivotal_Rockordings',
'Pyawbwe',
'McMurtrey_Aquatic_Center',
'Melandryidae',
'Jafet_Soto',
'Judy_Smith_Torrie',
'A_Different_Kind_of_Love_Song',
# 'New_Frontier',
'2003_League_of_Ireland',
'The_Grid_(TV_miniseries)',
'List_of_divided_U.S._Routes',
'Lilydale,_Victoria',
'Endo_(band)',
'Makoto_Kosaka'
]
pages.each do |page|
links = map_links(page, target, links)
end
g = GraphViz::new( "structs", "type" => "graph" )
g[:rankdir] = "LR"
g.node[:color] = "#ddaa66"
g.node[:style] = "filled"
g.node[:shape] = "box"
g.node[:penwidth] = "1"
g.node[:fontname] = "Trebuchet MS"
g.node[:fontsize] = "8"
g.node[:fillcolor]= "#ffeecc"
g.node[:fontcolor]= "#775500"
g.node[:margin] = "0.0"
# set global edge options
g.edge[:color] = "#999999"
g.edge[:weight] = "1"
g.edge[:fontsize] = "6"
g.edge[:fontcolor]= "#444444"
g.edge[:fontname] = "Verdana"
g.edge[:dir] = "forward"
g.edge[:arrowsize]= "0.5"
pages = (links.keys + links.values).uniq.compact
pages.each do |page|
g.add_node(page).label = page
end
links.each do |source, link|
g.add_edge(source, link) if source && link
end
g.output( :path => '/opt/local/bin', :pdf => "philosophy.pdf" )
#if !current
# puts "We reached a dead end on #{last}"
#elsif current == target
# puts "We arrived from start #{start} to #{target} in #{links.size} steps"
#else
# puts "We entered a loop starting on #{current}"
#end
@federomero
Copy link

That looks really nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment