Skip to content

Instantly share code, notes, and snippets.

@hading
Created June 8, 2012 22:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hading/2898333 to your computer and use it in GitHub Desktop.
Save hading/2898333 to your computer and use it in GitHub Desktop.
etd department extraction
#!/usr/bin/env ruby
require 'pdf-reader'
class Extracter
def main
Dir['*.pdf'].each do |pdf|
puts "Analyzing #{pdf}"
reader = PDF::Reader.new(pdf)
lines = reader.pages.collect {|p| p.text.lines.collect {|l| l.strip}}.flatten
lines.each_with_index do |line, i|
puts lines[i-1] if line.match(/^University of Illinois/i)
end
puts "\n"
end
end
end
Extracter.new.main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment