Skip to content

Instantly share code, notes, and snippets.

@abinoam
Created September 5, 2012 01:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save abinoam/0d83e2487a0e955411d6 to your computer and use it in GitHub Desktop.
Save abinoam/0d83e2487a0e955411d6 to your computer and use it in GitHub Desktop.
Sybren Kooistra doubt - http://www.ruby-forum.com/topic/4405257
require 'nokogiri'
require 'open-uri'
def get_search_result_links(n_page)
links = n_page.css('.linker-kolom li a')
puts "** There were #{links.length} links found"
links.each do |link|
href = link['href']
inner_url = 'https://zoek.officielebekendmakingen.nl' + href
puts "\n\n\nFetching page at #{File.basename(inner_url).split('?')[0]}"
datalezer = open(inner_url).read
lokalenieuwefilenaam = href + ".html"
lokalenieuwefile = open(lokalenieuwefilenaam, "w")
lokalenieuwefile.write(datalezer)
lokalenieuwefile.close
end
end
INITIAL_URL = 'https://zoek.officielebekendmakingen.nl/zoeken/resultaat/?zkt=Uitgebreid&pst=ParlementaireDocumenten'
initial_page = Nokogiri::HTML(open(INITIAL_URL))
pagination_links = initial_page.css('.paginering.beneden a')
last_page_link = pagination_links[-2]
last_page_number = last_page_link.text.to_i
(5..last_page_number).each do |page_num|
puts "\n\n\n***** Getting page #{page_num}"
results_page_url = "#{INITIAL_URL}&_page=#{page_num}"
results_page = Nokogiri::HTML(open(results_page_url))
get_search_result_links(results_page)
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment