Last active
March 12, 2016 15:26
-
-
Save MiguelBel/1d7df00fed4e369ac732 to your computer and use it in GitHub Desktop.
Script for get the visa refusal data in USA by country and by year (2006 to 2015) so far.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'pdf-reader' | |
require 'open-uri' | |
require 'json' | |
def read_file(year_two_digits) | |
url_pattern = "https://travel.state.gov/content/dam/visas/Statistics/Non-Immigrant-Statistics/RefusalRates/FY%s.pdf" | |
file = open(url_pattern % year_two_digits) | |
PDF::Reader.new(file) | |
end | |
def format_year(year) | |
year.to_s.rjust(2, '0') | |
end | |
def get_data(page) | |
page.split("\n").map(&:strip).map { |e| e.gsub(/\s+/, ' ') }.select { |e| e.match(/^.*%$/im) }.map { |e| e.split(/ (?=\d+)/) } | |
end | |
import_scope = (6..15).to_a.map(&method(:format_year)) | |
data = {} | |
import_scope.each do |year| | |
pdf = read_file(year) | |
countries_with_refusal_rate = [] | |
pdf.pages.each do |page| | |
countries_with_refusal_rate << get_data(page.text) | |
end | |
data[year] = countries_with_refusal_rate.flatten(1) | |
end | |
puts data.to_json |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment