Created
November 25, 2010 02:44
-
-
Save JoshCheek/714817 to your computer and use it in GitHub Desktop.
script to build a web page that shows the links contained on other pages
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# script to build a web page that shows the links contained on pages at source urls | |
# an example solution for the question at http://www.ruby-forum.com/topic/521021 | |
# if these change frequently, it might be better to read them from a file | |
# rather than store them in the source | |
urls = [ | |
'http://www.google.com/', | |
'http://weatherflash.com/usa/ks/wichita/', | |
'http://umbrellatoday.com/', | |
] | |
# store data about each page in a struct | |
# in this case, all I care about is the url of the page | |
# and the links that page contains | |
Page = Struct.new :url , :links | |
# extract the links from these pages | |
require 'open-uri' | |
require 'nokogiri' | |
pages = Array.new | |
urls.each do |url| | |
doc = Nokogiri::HTML(open url) | |
links = doc.css('a').map { |link| link['href'] } | |
pages << Page.new( url , links ) | |
end | |
# in reality, I would use a template with ERB, | |
# but this works and doesn't require knowledge of ERB :) | |
output = ' | |
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> | |
<html xmlns="http://www.w3.org/1999/xhtml"> | |
<head> | |
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> | |
<title>links for my favourite pages</title> | |
</head> | |
<body> | |
' | |
# hypothetical output data: each page is a paragraph stating the url | |
# followed by a list of the links that page contains | |
pages.each do |page| | |
output << "<p>The page #{page.url} has links <ul>" | |
page.links.each do |link| | |
output << "<li>#{link}</li>" | |
end | |
output << "</ul></p>" | |
end | |
output << "</body></html>" | |
# would probably output this to another file | |
puts output |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment