Skip to content

Instantly share code, notes, and snippets.

@mortenjohs
Created September 30, 2012 12:20
Show Gist options
  • Save mortenjohs/3806618 to your computer and use it in GitHub Desktop.
Save mortenjohs/3806618 to your computer and use it in GitHub Desktop.
A simple script to grab tabular data from a PDF
require 'pdf-reader'
require 'csv'
pdf_reader = PDF::Reader.new("input.pdf")
csv = CSV.open("output.tsv","wb", {:col_sep => "\t"})
area = ""
pdf_reader.pages[42..69].each do |page|
page.text.each_line do |line|
if /^[a-z|\s]*$/i=~line
area = line.strip
else
country = line.split(/[0-9]/).first
csv_line = line.sub(country,'').strip.split(/[\(|\)]/)
csv_line.unshift(country).unshift(area)
csv << csv_line
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment