Skip to content

Instantly share code, notes, and snippets.

@barbolo
Last active January 16, 2023 04:05
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save barbolo/27241038ed08666d0f4b to your computer and use it in GitHub Desktop.
Save barbolo/27241038ed08666d0f4b to your computer and use it in GitHub Desktop.
Convert PDF to XML in Ruby using poppler-utils
# Install dependencies:
#
# posix-spawn (check all benefits at https://github.com/rtomayko/posix-spawn)
# gem install posix-spawn
#
# Poppler utils (http://poppler.freedesktop.org/)
# apt-get install poppler-utils
#
require 'posix/spawn'
def pdf_to_xml(pdf_path)
cmd = "pdftohtml -stdout -xml -i -fontfullname \"#{pdf_path}\" 2>&1"
POSIX::Spawn::Child.new(cmd).out
end
pdf_to_xml('sample.pdf')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment