Skip to content

Instantly share code, notes, and snippets.

@LoranKloeze
Created July 12, 2020 14:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save LoranKloeze/05dabbde48a4910cccbb3ccd00a3c322 to your computer and use it in GitHub Desktop.
Save LoranKloeze/05dabbde48a4910cccbb3ccd00a3c322 to your computer and use it in GitHub Desktop.
UWV NOW pdf naar Excel
source 'https://rubygems.org'
ruby '2.7.1'
gem 'pdf-reader'
gem 'caxlsx'
require 'rubygems'
require 'stringio'
require 'pdf/reader'
require 'axlsx'
filename_in = './now-pdf-bestand-van-uwv-deel-1.pdf'
filename_out = './now-overzicht.xlsx'
puts 'Reading in pdf...'
contents = IO.read(filename_in, mode: 'rb')
contents_io = StringIO.new(contents)
@reader = PDF::Reader.new(contents_io)
puts 'Done!'
puts 'Starting conversion...'
p = Axlsx::Package.new
wb = p.workbook
wb.add_worksheet do |sheet|
@reader.pages.each_with_index do |page, page_nr|
break if page_nr == 10
print "#{page_nr} "
$stdout.flush
rows = page.text.split(/\n+/)
rows.each_with_index do |row, i|
next if i <= 1 || i == rows.count - 1 # Niet de headers of het paginanummer meenemen
matched = /(.*?)\s{5,}(.*?)\s{5,}(.*)/.match(row)
unless matched.nil?
sheet.add_row [matched[1], matched[2], matched[3].gsub('.', '')]
end
end
end
end
p.serialize filename_out
puts 'Done!'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment