Create a scraper on morph.io that, for each NSW fire area, collects:
- the name of the area
- fire danger level and total fire bans for today and tomorrow
- the list of councils effected
Set the scraper to run every day.
Create a scraper that collects bills introduced into NSW Parliament. Collect every bill introduced since 1997. For each bill collect the:
- bill’s name
- URL for the bill on parliament.nsw.gov.au
- the house the bill originated in
Set the scraper to run every day so that it stays up to date.
- Mechanize - http://mechanize.rubyforge.org/Mechanize.html
- Nokogiri - http://www.rubydoc.info/github/sparklemotion/nokogiri/
- morph.io docs - https://morph.io/documentation
- Useful Ruby bits:
- Ruby Regular Expressions - http://rubular.com/
- Get help with scraping at the morph.io help forum - https://help.morph.io/
Clone your scraper to your local machine:
git clone https://github.com/morph-test-scrapers/australian_federal_members_of_parliament_tutorial.git
Check and install any missing dependencies:
bundle
Start IRB session:
bundle exec irb
Run your scraper on your local machine
bundle exec ruby scraper.rb
Make the Mechanize and ScraperWiki libraries available:
require 'scraperwiki'
require 'mechanize'
Get the page to scrape using Mechanize:
agent = Mechanize.new
page = agent.get('https://www.yourpageurl.org.au/')
Return an element from the page using .at():
page.at(:h1)
Return an Array of elements using .search():
page.search(:h2)
Get the text from an element:
page.at(:h1).text
Get the value of an attribute on an element:
page.at(:img).attr('src')
Assign data to an Object:
record = {
name: page.at(:h1).text,
url: page.at(:h1).at(:a)[:href]
}
Save the record you've collected:
ScraperWiki.save_sqlite([:url], record)
Loop through series of elements:
page.search(:h2).each do |item|
# get the text for the second paragraph in the item element
item.search(:p)[1].text
end
- Civic tech monthly newsletter
- OpenAustralia Foundation Monthly Sydney Pub Meet & Lightning Talks