Skip to content

Instantly share code, notes, and snippets.

@MiguelBel
Last active March 12, 2016 15:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MiguelBel/1d7df00fed4e369ac732 to your computer and use it in GitHub Desktop.
Save MiguelBel/1d7df00fed4e369ac732 to your computer and use it in GitHub Desktop.
Script for get the visa refusal data in USA by country and by year (2006 to 2015) so far.
require 'pdf-reader'
require 'open-uri'
require 'json'
def read_file(year_two_digits)
url_pattern = "https://travel.state.gov/content/dam/visas/Statistics/Non-Immigrant-Statistics/RefusalRates/FY%s.pdf"
file = open(url_pattern % year_two_digits)
PDF::Reader.new(file)
end
def format_year(year)
year.to_s.rjust(2, '0')
end
def get_data(page)
page.split("\n").map(&:strip).map { |e| e.gsub(/\s+/, ' ') }.select { |e| e.match(/^.*%$/im) }.map { |e| e.split(/ (?=\d+)/) }
end
import_scope = (6..15).to_a.map(&method(:format_year))
data = {}
import_scope.each do |year|
pdf = read_file(year)
countries_with_refusal_rate = []
pdf.pages.each do |page|
countries_with_refusal_rate << get_data(page.text)
end
data[year] = countries_with_refusal_rate.flatten(1)
end
puts data.to_json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment