Documentation: http://www.foolabs.com/xpdf/download.html
brew install poppler
pdftotext document.pdf text_file.txt
http://www.unixuser.org/~euske/python/pdfminer/ http://euske.github.io/pdfminer/index.html
pip install pdfminer
pdf2txt.py -o text_file.txt document.pdf
http://calibre-ebook.com/ (http://manual.calibre-ebook.com/cli/ebook-convert.html)
http://calibre-ebook.com/download_osx
OSX download
/Applications/calibre.app/Contents/MacOS/ebook-convert document.pdf document_calibre.txt
Download the jar file
java -jar tika-app-1.7.jar --text document.pdf > text_file.txt
brew install ghostscript
gs -dBATCH -dNOPAUSE -sDEVICE=txtwrite -sOutputFile=text_file.txt document.pdf
It is based on "pdftotext" from the Xpdf suite, but with a different layout algorithm that preserves relative column position and line spacing.
brew install ghostscript
brew install Tesseract
Script which converts PDF into TIFF w/ ghostscript and then TIFF to txt with Tesseract for more info http://benschmidt.org/dighist13/?page_id=129
https://pdfbox.apache.org/1.8/commandline.html
Download the jar file
java -jar pdfbox-app-1.8.8.jar ExtractText document.pdf textfile.txt
http://jocr.sourceforge.net/ http://www.gnu.org/software/ocrad/ocrad.html https://wiki.gnome.org/action/show/Apps/OCRFeeder?action=show&redirect=OCRFeeder
I guess the proper name should be
pdftotexttools.md
(or something similar).