Skip to content

Instantly share code, notes, and snippets.

@tily
Created March 22, 2011 10:19
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tily/881021 to your computer and use it in GitHub Desktop.
Save tily/881021 to your computer and use it in GitHub Desktop.
OCR with Google Docs API
require 'open-uri'
require "rubygems"
require "mechanize"
BANNER = "Usage: ruby google_ocr.rb http://ec2.images-amazon.com/images/I/21IVvn7zGAL._SL500_AA300_.jpg"
EMAIL = "email here"
PASSWD = "password here"
def main(args)
unless file = args.shift
abort BANNER
end
@mech = Mechanize.new
#require 'logger'; @mech.log = Logger.new(STDOUT)
@mech.request_headers = { "GData-Version" => "3.0" }
auth(EMAIL, PASSWD)
doc_id = upload(file)
puts download(doc_id)
end
def upload(file)
data = open(file).read
@mech.post(
"http://docs.google.com/feeds/default/private/full?ocr=true&lang=ja", data,
"Content-Type" => "image/png", "Content-Length" => data.length, "Slug" => file
)
regexp = %r{^http://docs\.google\.com/feeds/default/private/full/document%3A(.+)$}
@mech.page.response['location'][regexp, 1]
end
def download(doc_id)
@mech.get(
"http://docs.google.com/feeds/download/documents/Export",
"docID" => doc_id, "exportFormat" => "txt", "format" => "txt"
)
@mech.page.content
end
def auth(email, passwd)
@mech.post(
"https://www.google.com/accounts/ClientLogin",
"Email" => email, "Passwd" => passwd, "service" => "writely"
)
token = @mech.page.content[/Auth=(.+)/, 1]
@mech.request_headers.update("Authorization" => "GoogleLogin auth=#{token}")
end
main(ARGV)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment