Skip to content

Instantly share code, notes, and snippets.

@joao
Created September 8, 2018 09:22
Show Gist options
  • Save joao/aff31d65db8f60ae26cb2c3712d40ec7 to your computer and use it in GitHub Desktop.
Save joao/aff31d65db8f60ae26cb2c3712d40ec7 to your computer and use it in GitHub Desktop.
Scraper para tempo.pt
# Instruções:
# 1) correr 'gem install httparty' no terminal do macOS
# 2) actualizar o station_id com o ID da estação que se pretende do website tempo.pt
# 3) escolher o espaço temporal pretendido nas variáveis mais abaixo: start_date, end_date
# 4) correr 'ruby temperature_scraper.rb'
#
# É gerado um ficheiro temp_media, separado por ','
require 'httparty'
require 'date'
require 'json'
sleep_time = 0.25
# Station
station_id = "571e0811c76c49177837e8c3"
station_name = "lisboa"
# Date stuff
start_date = Date.parse('20170101')
end_date = Date.parse('20171231')
current_date = start_date
date_range = []
while (current_date <= end_date)
date_range << current_date
current_date = current_date + 1
end
date_range.each do |date|
date_split = date.to_s.split('-')
day = date_split[2][0] == "0" ? date_split[2][1]: date_split[2]
mes = date_split[1][0] == "0" ? date_split[1][1]: date_split[1]
ano = date_split[0]
url = "https://www.tempo.pt/peticiones/historico.php?id_estacion=" + station_id.to_s + "&accion=DIA&dia=" + day.to_s + "&mes=" + mes.to_s + "&anno=" + ano.to_s
response = HTTParty.get(url)
response_json = JSON.parse(response.parsed_response)
temp_media = response_json = response_json['diario']['temp_media']
puts temp_media
temp_media_file = File.new("temp_media_#{station_name}.txt", "a")
temp_media_file.print("#{temp_media},")
temp_media_file.close
sleep sleep_time
end
puts "Temperaturas médias estão no ficheiro temp_media.txt."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment