Created
July 8, 2016 18:17
-
-
Save practicingruby/a96ff5b1ab9b00c85d556a84f502d2d2 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require "pdf/inspector" # gem install pdf-inspector | |
text = File.binread("2016WayneCountyTaxLiens.pdf") | |
text_analysis = PDF::Inspector::Text.analyze(text) | |
File.write("dump.txt", text_analysis.strings.join) | |
# For more, see https://github.com/prawnpdf/pdf-inspector | |
# and also https://github.com/yob/pdf-reader | |
# | |
# PDF::Reader can be used to build a streaming parser, and possibly use different states in document to get a better dump | |
# (i.e. you could use it to look for where the natural breaks in the document are by analyzing what's being drawn) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment