Skip to content

Instantly share code, notes, and snippets.

@vodafon
Created April 1, 2015 11:05
Show Gist options
  • Save vodafon/2509adec624bf93cb18d to your computer and use it in GitHub Desktop.
Save vodafon/2509adec624bf93cb18d to your computer and use it in GitHub Desktop.
# gem install pdf-reader
require 'pdf-reader'
class ParsePdf
def initialize(filename)
@reader = PDF::Reader.new(filename)
end
def process
result = []
@reader.pages.each_with_index do |page, index|
result << parse_lines(page.text.split("\n")) if (index + 1) % 4 == 0
end
result
end
def parse_lines(lines)
lines = lines[20..24].map { |line| line.strip.split(/\s{5,}/) }
lines.map { |line| line.first if line.length > 1 }.compact
end
end
# Example:
# parse = ParsePdf.new("feeds.pdf")
# p parse.process
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment