Skip to content

Instantly share code, notes, and snippets.

@robmckinnon
Created July 29, 2016 15:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save robmckinnon/24cd363edf6f577b8133d163b8ecbaad to your computer and use it in GitHub Desktop.
Save robmckinnon/24cd363edf6f577b8133d163b8ecbaad to your computer and use it in GitHub Desktop.
Download discovery registers as TSV
`mkdir data`
%w[uk
street
school
school-type
school-tag
school-phase
school-gender
school-federation
school-admissions-policy
register
place
local-authority
local-authority-type
field
diocese
denomination
datatype
address].each do |r|
puts ''
puts '===='
puts r
`mkdir data/#{r}`
count = `curl http://#{r}.discovery.openregister.org/ | grep -A 1 'Total records:' | grep dd | sed 's/.*records">//' | sed 's/<span.*//'`
pages = (Integer(count.strip) / 5000) + 1
files = []
1.upto(pages) do |i|
url = "http://#{r}.discovery.openregister.org/records.tsv?page-index=#{i}&page-size=5000"
file = "data/#{r}/#{r}-#{"%05d" % (i)}.tsv"
cmd = "curl --output '#{file}' '#{url}'"
puts cmd
`#{cmd}`
cmd = if i == 1
"cat #{file} > data/#{r}/#{r}.tsv"
else
"sed 1d #{file} >> data/#{r}/#{r}.tsv"
end
puts cmd
`#{cmd}`
`rm #{file}`
end
puts "expected: #{count}"
puts "got: #{`sed 1d data/#{r}/#{r}.tsv | wc -l`}"
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment