Skip to content

Instantly share code, notes, and snippets.

@arton
Created March 19, 2019 09:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save arton/c69654c0c702858e2aa6d78878ebf994 to your computer and use it in GitHub Desktop.
Save arton/c69654c0c702858e2aa6d78878ebf994 to your computer and use it in GitHub Desktop.
require 'origami'
pdf = Origami::PDF.read(ARGV[0], verbosity: Origami::Parser::VERBOSE_QUIET)
unless pdf.pages.first.Resources.Font.C2_0
exit 1
end
font_table = {}
pdf.pages.first.Resources.each_font do |name, font|
puts name if $DEBUG
utbl = font_table[name.to_s] = {}
font.ToUnicode.data.each_line do |line|
if line =~ /\A<([^>]+)>\s*<([^>]+)>(?:\s*<([^>]+)>)?/
if $3
puts line if $DEBUG
base = $3.hex
start = $1.hex
start.upto($2.hex) do |i|
utbl[sprintf('%04X', i)] = sprintf('%04X', base + (i - start))
puts "#{sprintf('%04X', i)} => #{sprintf('%04X', base + (i - start))}" if $DEBUG
end
else
utbl[$1.upcase] = $2
puts "#{$1} => #{$2}" if $DEBUG
end
end
end
end
puts font_table.keys.inspect if $DEBUG
utbl = font_table.values[0]
sentence = ''
line = pdf.pages.first.Contents.data
while /(?:\/(C2_\d+)\s[^<]+)?<([0-9A-F]+)>/m.match(line)
rest = $'
if $1
puts $1 if $DEBUG
utbl = font_table[$1]
end
sentence << $2.split('').each_slice(4).map(&:join).map do |w|
x = utbl[w]
unless x
puts "#{w} not found" if $DEBUG
x = '0000'
end
x
end.map(&:hex).pack('U*')
line = rest
end
puts sentence.strip
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment