Skip to content

Instantly share code, notes, and snippets.

@jazzido
Created April 27, 2014 22:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jazzido/11357558 to your computer and use it in GitHub Desktop.
Save jazzido/11357558 to your computer and use it in GitHub Desktop.
require 'tabula'
pdf_file_path = "mineriafinal.pdf"
outfilename = "mineriafinalv3.csv"
out = open(outfilename, 'w')
extractor = Tabula::Extraction::ObjectExtractor.new(pdf_file_path, :all)
top, left, bottom, right = [104.46,13,580.54,820.82]
area = Tabula::ZoneEntity.new(top, left, right - left, bottom - top)
extractor.extract.each do |pdf_page|
STDERR.puts "Extracting page: #{pdf_page.number_one_indexed}"
h = pdf_page.get_area(area).spreadsheets
h.each do |spreadsheet|
out << spreadsheet.to_csv.gsub(/\r/, '') # eliminar carriage returns que aparecen en el PDF
end
end
out.close
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment