Skip to content

Instantly share code, notes, and snippets.

@jraines
Created June 9, 2011 22:08
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jraines/1017892 to your computer and use it in GitHub Desktop.
Save jraines/1017892 to your computer and use it in GitHub Desktop.
Docsplit notes

Dependencies

  • GraphicsMagick
  • poppler-utils
  • open office -- will this work with Open Office on server?
gem 'docsplit'

Extraction

a = Docsplit.extract_text '/path/to/file'   #creates a pdf at an obscure (temp?) file location
content = `pdftotext #{a.first} -`          #dash sends results to stdout
  • Possibly need a script to remove pdfs generated by this process
  • seems like a lot of overhead to grab text from a ppt . . . but it seems to work
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment