Skip to content

Instantly share code, notes, and snippets.

@bergpb
Forked from emad-elsaid/pdf2txt.rb
Created November 17, 2017 17:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bergpb/ec41f3b172ba90e337e73f7b3db42e9e to your computer and use it in GitHub Desktop.
Save bergpb/ec41f3b172ba90e337e73f7b3db42e9e to your computer and use it in GitHub Desktop.
PDF to Text converter using ruby
#!/usr/bin/env ruby
require 'pdf/reader' # gem install pdf-reader
# credits to :
# https://github.com/yob/pdf-reader/blob/master/examples/text.rb
# usage example:
# ruby pdf2txt.rb /path-to-file/file1.pdf [/path-to-file/file2.pdf..]
ARGV.each do |filename|
PDF::Reader.open(filename) do |reader|
puts "Converting : #{filename}"
pageno = 0
txt = reader.pages.map do |page|
pageno += 1
begin
print "Converting Page #{pageno}/#{reader.page_count}\r"
page.text
rescue
puts "Page #{pageno}/#{reader.page_count} Failed to convert"
''
end
end # pages map
puts "\nWriting text to disk"
File.write filename+'.txt', txt.join("\n")
end # reader
end # each
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment