Skip to content

Instantly share code, notes, and snippets.

@tleyden
Created November 13, 2014 17:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tleyden/4bcfaff97ecf210a0de5 to your computer and use it in GitHub Desktop.
Save tleyden/4bcfaff97ecf210a0de5 to your computer and use it in GitHub Desktop.
openocr + hocr
$ curl -v -X POST -H "Content-Type: application/json" -d '{"img_url":"http://bit.ly/ocrimage","engine":"tesseract", "engine_args":{"config_vars":{"tessedit_create_hocr": "1", "tessedit_pageseg_mode": "1"}}}' http://${RABBITMQ_HOST}:${HTTP_PORT}/ocr
* About to connect() to 162.222.178.49 port 8080 (#0)
* Trying 162.222.178.49...
* Adding handle: conn: 0x7faf9b00aa00
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7faf9b00aa00) send_pipe: 1, recv_pipe: 0
* Connected to 162.222.178.49 (162.222.178.49) port 8080 (#0)
> POST /ocr HTTP/1.1
> User-Agent: curl/7.30.0
> Host: 162.222.178.49:8080
> Accept: */*
> Content-Type: application/json
> Content-Length: 148
>
* upload completely sent off: 148 out of 148 bytes
< HTTP/1.1 200 OK
< Date: Thu, 13 Nov 2014 17:42:00 GMT
< Content-Type: text/xml; charset=utf-8
< Transfer-Encoding: chunked
<
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>
</title>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name='ocr-system' content='tesseract 3.03' />
<meta name='ocr-capabilities' content='ocr_page ocr_carea ocr_par ocr_line ocrx_word'/>
</head>
<body>
<div class='ocr_page' id='page_1' title='image "/tmp/c7d1ba68-4aaa-44d8-7b31-46ca1b1e2be8"; bbox 0 0 1538 270; ppageno 0'>
<div class='ocr_carea' id='block_1_1' title="bbox 40 0 1488 270">
<p class='ocr_par' dir='ltr' id='par_1_1' title="bbox 44 18 1482 238">
<span class='ocr_line' id='line_1_1' title="bbox 44 18 1482 62; baseline 0 -10"><span class='ocrx_word' id='word_1_1' title='bbox 44 20 120 52; x_wconf 88' lang='eng' dir='ltr'>You</span> <span class='ocrx_word' id='word_1_2' title='bbox 142 30 208 52; x_wconf 95' lang='eng' dir='ltr'>can</span> <span class='ocrx_word' id='word_1_3' title='bbox 228 24 342 52; x_wconf 82' lang='eng' dir='ltr'>create</span> <span class='ocrx_word' id='word_1_4' title='bbox 364 18 454 52; x_wconf 84' lang='eng' dir='ltr'><strong>local</strong></span> <span class='ocrx_word' id='word_1_5' title='bbox 474 18 646 52; x_wconf 82' lang='eng' dir='ltr'>variables</span> <span class='ocrx_word' id='word_1_6' title='bbox 666 18 722 52; x_wconf 90' lang='eng' dir='ltr'>for</span> <span class='ocrx_word' id='word_1_7' title='bbox 742 18 802 52; x_wconf 98' lang='eng' dir='ltr'>the</span> <span class='ocrx_word' id='word_1_8' title='bbox 822 18 996 62; x_wconf 88' lang='eng' dir='ltr'>pipelines</span> <span class='ocrx_word' id='word_1_9' title='bbox 1016 18 1140 52; x_wconf 86' lang='eng' dir='ltr'>within</span> <span class='ocrx_word' id='word_1_10' title='bbox 1162 18 1222 52; x_wconf 98' lang='eng' dir='ltr'>the</span> <span class='ocrx_word' id='word_1_11' title='bbox 1242 18 1412 62; x_wconf 84' lang='eng' dir='ltr'>template</span> <span class='ocrx_word' id='word_1_12' title='bbox 1434 18 1482 62; x_wconf 82' lang='eng' dir='ltr'>by</span>
</span>
<span class='ocr_line' id='line_1_2' title="bbox 46 80 1478 124; baseline 0 -10"><span class='ocrx_word' id='word_1_13' title='bbox 46 80 220 124; x_wconf 81' lang='eng' dir='ltr'>prefixing</span> <span class='ocrx_word' id='word_1_14' title='bbox 238 80 298 114; x_wconf 98' lang='eng' dir='ltr'>the</span> <span class='ocrx_word' id='word_1_15' title='bbox 318 80 472 114; x_wconf 82' lang='eng' dir='ltr'>variable</span> <span class='ocrx_word' id='word_1_16' title='bbox 490 92 596 114; x_wconf 89' lang='eng' dir='ltr'>name</span> <span class='ocrx_word' id='word_1_17' title='bbox 616 80 702 114; x_wconf 86' lang='eng' dir='ltr'>with</span> <span class='ocrx_word' id='word_1_18' title='bbox 720 92 740 114; x_wconf 93' lang='eng' dir='ltr'>a</span> <span class='ocrx_word' id='word_1_19' title='bbox 760 80 818 120; x_wconf 87' lang='eng'>“$&quot;</span> <span class='ocrx_word' id='word_1_20' title='bbox 838 80 926 124; x_wconf 86' lang='eng' dir='ltr'>sign.</span> <span class='ocrx_word' id='word_1_21' title='bbox 944 80 1108 114; x_wconf 83' lang='eng' dir='ltr'>Variable</span> <span class='ocrx_word' id='word_1_22' title='bbox 1126 92 1252 114; x_wconf 88' lang='eng' dir='ltr'>names</span> <span class='ocrx_word' id='word_1_23' title='bbox 1272 80 1362 114; x_wconf 89' lang='eng' dir='ltr'>have</span> <span class='ocrx_word' id='word_1_24' title='bbox 1378 86 1416 114; x_wconf 90' lang='eng' dir='ltr'>to</span> <span class='ocrx_word' id='word_1_25' title='bbox 1434 80 1478 114; x_wconf 96' lang='eng' dir='ltr'>be</span>
</span>
<span class='ocr_line' id='line_1_3' title="bbox 48 142 1478 186; baseline 0 -10"><span class='ocrx_word' id='word_1_26' title='bbox 48 142 242 186; x_wconf 87' lang='eng' dir='ltr'>composed</span> <span class='ocrx_word' id='word_1_27' title='bbox 252 142 292 176; x_wconf 84' lang='eng' dir='ltr'>of</span> <span class='ocrx_word' id='word_1_28' title='bbox 302 142 570 186; x_wconf 87' lang='eng' dir='ltr'>alphanumeric</span> <span class='ocrx_word' id='word_1_29' title='bbox 582 142 780 176; x_wconf 85' lang='eng' dir='ltr'>characters</span> <span class='ocrx_word' id='word_1_30' title='bbox 794 142 866 176; x_wconf 90' lang='eng' dir='ltr'>and</span> <span class='ocrx_word' id='word_1_31' title='bbox 876 142 936 176; x_wconf 98' lang='eng' dir='ltr'>the</span> <span class='ocrx_word' id='word_1_32' title='bbox 948 142 1174 176; x_wconf 84' lang='eng' dir='ltr'>underscore.</span> <span class='ocrx_word' id='word_1_33' title='bbox 1190 144 1230 176; x_wconf 98' lang='eng' dir='ltr'>In</span> <span class='ocrx_word' id='word_1_34' title='bbox 1242 142 1302 176; x_wconf 98' lang='eng' dir='ltr'>the</span> <span class='ocrx_word' id='word_1_35' title='bbox 1318 142 1478 186; x_wconf 83' lang='eng' dir='ltr'>example</span>
</span>
<span class='ocr_line' id='line_1_4' title="bbox 46 204 1276 238; baseline 0 0"><span class='ocrx_word' id='word_1_36' title='bbox 46 204 162 238; x_wconf 86' lang='eng' dir='ltr'>below</span> <span class='ocrx_word' id='word_1_37' title='bbox 172 206 186 238; x_wconf 96' lang='eng' dir='ltr'><strong>I</strong></span> <span class='ocrx_word' id='word_1_38' title='bbox 198 204 288 238; x_wconf 89' lang='eng' dir='ltr'>have</span> <span class='ocrx_word' id='word_1_39' title='bbox 298 204 388 238; x_wconf 88' lang='eng' dir='ltr'>used</span> <span class='ocrx_word' id='word_1_40' title='bbox 400 216 420 238; x_wconf 93' lang='eng' dir='ltr'>a</span> <span class='ocrx_word' id='word_1_41' title='bbox 430 204 498 238; x_wconf 85' lang='eng' dir='ltr'>few</span> <span class='ocrx_word' id='word_1_42' title='bbox 508 204 700 238; x_wconf 83' lang='eng' dir='ltr'>variations</span> <span class='ocrx_word' id='word_1_43' title='bbox 712 204 788 238; x_wconf 90' lang='eng' dir='ltr'>that</span> <span class='ocrx_word' id='word_1_44' title='bbox 800 204 898 238; x_wconf 86' lang='eng' dir='ltr'>work</span> <span class='ocrx_word' id='word_1_45' title='bbox 908 204 964 238; x_wconf 90' lang='eng' dir='ltr'>for</span> <span class='ocrx_word' id='word_1_46' title='bbox 974 204 1128 238; x_wconf 90' lang='eng' dir='ltr'>variable</span> <span class='ocrx_word' id='word_1_47' title='bbox 1140 216 1276 238; x_wconf 95' lang='eng' dir='ltr'>names.</span>
</span>
</p>
</div>
</div>
</body>
</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment