Skip to content

Instantly share code, notes, and snippets.

@JoshCheek
Created November 25, 2010 02:44
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save JoshCheek/714817 to your computer and use it in GitHub Desktop.
Save JoshCheek/714817 to your computer and use it in GitHub Desktop.
script to build a web page that shows the links contained on other pages
# script to build a web page that shows the links contained on pages at source urls
# an example solution for the question at http://www.ruby-forum.com/topic/521021
# if these change frequently, it might be better to read them from a file
# rather than store them in the source
urls = [
'http://www.google.com/',
'http://weatherflash.com/usa/ks/wichita/',
'http://umbrellatoday.com/',
]
# store data about each page in a struct
# in this case, all I care about is the url of the page
# and the links that page contains
Page = Struct.new :url , :links
# extract the links from these pages
require 'open-uri'
require 'nokogiri'
pages = Array.new
urls.each do |url|
doc = Nokogiri::HTML(open url)
links = doc.css('a').map { |link| link['href'] }
pages << Page.new( url , links )
end
# in reality, I would use a template with ERB,
# but this works and doesn't require knowledge of ERB :)
output = '
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>links for my favourite pages</title>
</head>
<body>
'
# hypothetical output data: each page is a paragraph stating the url
# followed by a list of the links that page contains
pages.each do |page|
output << "<p>The page #{page.url} has links <ul>"
page.links.each do |link|
output << "<li>#{link}</li>"
end
output << "</ul></p>"
end
output << "</body></html>"
# would probably output this to another file
puts output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment