Skip to content

Instantly share code, notes, and snippets.

@johnjohndoe
Created July 4, 2014 09:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save johnjohndoe/1abd8dd91aa597131998 to your computer and use it in GitHub Desktop.
Save johnjohndoe/1abd8dd91aa597131998 to your computer and use it in GitHub Desktop.
Digitize tables in PDF to CSV
require 'tabula'
pdf_file_path = "Abschiebungen.pdf"
outfilename = "Abschiebungen.csv"
out = open(outfilename, 'w')
extractor = Tabula::Extraction::ObjectExtractor.new(pdf_file_path, :all )
extractor.extract.each do |pdf_page|
pdf_page.spreadsheets.each do |spreadsheet|
out << spreadsheet.to_csv
out << "\n\n"
end
end
out.close
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment