Skip to content

Instantly share code, notes, and snippets.

@why-el
Created June 14, 2013 21:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save why-el/5785427 to your computer and use it in GitHub Desktop.
Save why-el/5785427 to your computer and use it in GitHub Desktop.
Get all Moroccan cities from French Wikipedia.
# This will output all Moroccan cities (including some that are not
# recognized internationally) in a text file called, well,
# moroccan_cities.txt.
require 'nokogiri'
require 'open-uri'
# This should not be relied on. Get the list once and be done with it.
url = "https://fr.wikipedia.org/wiki/Villes_du_Maroc"
document = Nokogiri::HTML(open(url))
output_file = File.new("moroccan_cities.txt", "w")
data = document.css("div#content").css("ul").css("li").css("b")
.css("a")
# There is an issue with the first element being some random text. Not bothering.
data.each {|href| output_file.puts href.text.strip unless href == data.to_a.first}
output_file.close
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment