Skip to content

Instantly share code, notes, and snippets.

@kbaribeau
Created January 19, 2014 18:20
Show Gist options
  • Save kbaribeau/8508798 to your computer and use it in GitHub Desktop.
Save kbaribeau/8508798 to your computer and use it in GitHub Desktop.
convert pdf files to text for further parsing
(ns pdf-text-extractor.core
(:import [com.snowtide.pdf OutputTarget PDFTextStream]))
(defn pdf-to-text [file]
(let [pdfts (new PDFTextStream file)
output (new StringBuilder 1024)]
(.pipe pdfts (new OutputTarget output))
(.close pdfts)
output))
(defn -main [& args]
(println (apply str
(interleave (map pdf-to-text args) (repeat "\n------------------------------------\n")))))
(defproject pdf-text-extractor "0.1.0-SNAPSHOT"
:description "FIXME: write description"
:url "http://example.com/FIXME"
:license {:name "Eclipse Public License"
:url "http://www.eclipse.org/legal/epl-v10.html"}
:dependencies [[org.clojure/clojure "1.5.1"]
[com.snowtide/pdftextstream "2.6.4"]]
:repositories [["snowtide-releases" "http://maven.snowtide.com/releases"]]
:main pdf-text-extractor.core)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment