Created
November 13, 2014 17:44
-
-
Save tleyden/4bcfaff97ecf210a0de5 to your computer and use it in GitHub Desktop.
openocr + hocr
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ curl -v -X POST -H "Content-Type: application/json" -d '{"img_url":"http://bit.ly/ocrimage","engine":"tesseract", "engine_args":{"config_vars":{"tessedit_create_hocr": "1", "tessedit_pageseg_mode": "1"}}}' http://${RABBITMQ_HOST}:${HTTP_PORT}/ocr | |
* About to connect() to 162.222.178.49 port 8080 (#0) | |
* Trying 162.222.178.49... | |
* Adding handle: conn: 0x7faf9b00aa00 | |
* Adding handle: send: 0 | |
* Adding handle: recv: 0 | |
* Curl_addHandleToPipeline: length: 1 | |
* - Conn 0 (0x7faf9b00aa00) send_pipe: 1, recv_pipe: 0 | |
* Connected to 162.222.178.49 (162.222.178.49) port 8080 (#0) | |
> POST /ocr HTTP/1.1 | |
> User-Agent: curl/7.30.0 | |
> Host: 162.222.178.49:8080 | |
> Accept: */* | |
> Content-Type: application/json | |
> Content-Length: 148 | |
> | |
* upload completely sent off: 148 out of 148 bytes | |
< HTTP/1.1 200 OK | |
< Date: Thu, 13 Nov 2014 17:42:00 GMT | |
< Content-Type: text/xml; charset=utf-8 | |
< Transfer-Encoding: chunked | |
< | |
<?xml version="1.0" encoding="UTF-8"?> | |
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" | |
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> | |
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> | |
<head> | |
<title> | |
</title> | |
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> | |
<meta name='ocr-system' content='tesseract 3.03' /> | |
<meta name='ocr-capabilities' content='ocr_page ocr_carea ocr_par ocr_line ocrx_word'/> | |
</head> | |
<body> | |
<div class='ocr_page' id='page_1' title='image "/tmp/c7d1ba68-4aaa-44d8-7b31-46ca1b1e2be8"; bbox 0 0 1538 270; ppageno 0'> | |
<div class='ocr_carea' id='block_1_1' title="bbox 40 0 1488 270"> | |
<p class='ocr_par' dir='ltr' id='par_1_1' title="bbox 44 18 1482 238"> | |
<span class='ocr_line' id='line_1_1' title="bbox 44 18 1482 62; baseline 0 -10"><span class='ocrx_word' id='word_1_1' title='bbox 44 20 120 52; x_wconf 88' lang='eng' dir='ltr'>You</span> <span class='ocrx_word' id='word_1_2' title='bbox 142 30 208 52; x_wconf 95' lang='eng' dir='ltr'>can</span> <span class='ocrx_word' id='word_1_3' title='bbox 228 24 342 52; x_wconf 82' lang='eng' dir='ltr'>create</span> <span class='ocrx_word' id='word_1_4' title='bbox 364 18 454 52; x_wconf 84' lang='eng' dir='ltr'><strong>local</strong></span> <span class='ocrx_word' id='word_1_5' title='bbox 474 18 646 52; x_wconf 82' lang='eng' dir='ltr'>variables</span> <span class='ocrx_word' id='word_1_6' title='bbox 666 18 722 52; x_wconf 90' lang='eng' dir='ltr'>for</span> <span class='ocrx_word' id='word_1_7' title='bbox 742 18 802 52; x_wconf 98' lang='eng' dir='ltr'>the</span> <span class='ocrx_word' id='word_1_8' title='bbox 822 18 996 62; x_wconf 88' lang='eng' dir='ltr'>pipelines</span> <span class='ocrx_word' id='word_1_9' title='bbox 1016 18 1140 52; x_wconf 86' lang='eng' dir='ltr'>within</span> <span class='ocrx_word' id='word_1_10' title='bbox 1162 18 1222 52; x_wconf 98' lang='eng' dir='ltr'>the</span> <span class='ocrx_word' id='word_1_11' title='bbox 1242 18 1412 62; x_wconf 84' lang='eng' dir='ltr'>template</span> <span class='ocrx_word' id='word_1_12' title='bbox 1434 18 1482 62; x_wconf 82' lang='eng' dir='ltr'>by</span> | |
</span> | |
<span class='ocr_line' id='line_1_2' title="bbox 46 80 1478 124; baseline 0 -10"><span class='ocrx_word' id='word_1_13' title='bbox 46 80 220 124; x_wconf 81' lang='eng' dir='ltr'>prefixing</span> <span class='ocrx_word' id='word_1_14' title='bbox 238 80 298 114; x_wconf 98' lang='eng' dir='ltr'>the</span> <span class='ocrx_word' id='word_1_15' title='bbox 318 80 472 114; x_wconf 82' lang='eng' dir='ltr'>variable</span> <span class='ocrx_word' id='word_1_16' title='bbox 490 92 596 114; x_wconf 89' lang='eng' dir='ltr'>name</span> <span class='ocrx_word' id='word_1_17' title='bbox 616 80 702 114; x_wconf 86' lang='eng' dir='ltr'>with</span> <span class='ocrx_word' id='word_1_18' title='bbox 720 92 740 114; x_wconf 93' lang='eng' dir='ltr'>a</span> <span class='ocrx_word' id='word_1_19' title='bbox 760 80 818 120; x_wconf 87' lang='eng'>“$"</span> <span class='ocrx_word' id='word_1_20' title='bbox 838 80 926 124; x_wconf 86' lang='eng' dir='ltr'>sign.</span> <span class='ocrx_word' id='word_1_21' title='bbox 944 80 1108 114; x_wconf 83' lang='eng' dir='ltr'>Variable</span> <span class='ocrx_word' id='word_1_22' title='bbox 1126 92 1252 114; x_wconf 88' lang='eng' dir='ltr'>names</span> <span class='ocrx_word' id='word_1_23' title='bbox 1272 80 1362 114; x_wconf 89' lang='eng' dir='ltr'>have</span> <span class='ocrx_word' id='word_1_24' title='bbox 1378 86 1416 114; x_wconf 90' lang='eng' dir='ltr'>to</span> <span class='ocrx_word' id='word_1_25' title='bbox 1434 80 1478 114; x_wconf 96' lang='eng' dir='ltr'>be</span> | |
</span> | |
<span class='ocr_line' id='line_1_3' title="bbox 48 142 1478 186; baseline 0 -10"><span class='ocrx_word' id='word_1_26' title='bbox 48 142 242 186; x_wconf 87' lang='eng' dir='ltr'>composed</span> <span class='ocrx_word' id='word_1_27' title='bbox 252 142 292 176; x_wconf 84' lang='eng' dir='ltr'>of</span> <span class='ocrx_word' id='word_1_28' title='bbox 302 142 570 186; x_wconf 87' lang='eng' dir='ltr'>alphanumeric</span> <span class='ocrx_word' id='word_1_29' title='bbox 582 142 780 176; x_wconf 85' lang='eng' dir='ltr'>characters</span> <span class='ocrx_word' id='word_1_30' title='bbox 794 142 866 176; x_wconf 90' lang='eng' dir='ltr'>and</span> <span class='ocrx_word' id='word_1_31' title='bbox 876 142 936 176; x_wconf 98' lang='eng' dir='ltr'>the</span> <span class='ocrx_word' id='word_1_32' title='bbox 948 142 1174 176; x_wconf 84' lang='eng' dir='ltr'>underscore.</span> <span class='ocrx_word' id='word_1_33' title='bbox 1190 144 1230 176; x_wconf 98' lang='eng' dir='ltr'>In</span> <span class='ocrx_word' id='word_1_34' title='bbox 1242 142 1302 176; x_wconf 98' lang='eng' dir='ltr'>the</span> <span class='ocrx_word' id='word_1_35' title='bbox 1318 142 1478 186; x_wconf 83' lang='eng' dir='ltr'>example</span> | |
</span> | |
<span class='ocr_line' id='line_1_4' title="bbox 46 204 1276 238; baseline 0 0"><span class='ocrx_word' id='word_1_36' title='bbox 46 204 162 238; x_wconf 86' lang='eng' dir='ltr'>below</span> <span class='ocrx_word' id='word_1_37' title='bbox 172 206 186 238; x_wconf 96' lang='eng' dir='ltr'><strong>I</strong></span> <span class='ocrx_word' id='word_1_38' title='bbox 198 204 288 238; x_wconf 89' lang='eng' dir='ltr'>have</span> <span class='ocrx_word' id='word_1_39' title='bbox 298 204 388 238; x_wconf 88' lang='eng' dir='ltr'>used</span> <span class='ocrx_word' id='word_1_40' title='bbox 400 216 420 238; x_wconf 93' lang='eng' dir='ltr'>a</span> <span class='ocrx_word' id='word_1_41' title='bbox 430 204 498 238; x_wconf 85' lang='eng' dir='ltr'>few</span> <span class='ocrx_word' id='word_1_42' title='bbox 508 204 700 238; x_wconf 83' lang='eng' dir='ltr'>variations</span> <span class='ocrx_word' id='word_1_43' title='bbox 712 204 788 238; x_wconf 90' lang='eng' dir='ltr'>that</span> <span class='ocrx_word' id='word_1_44' title='bbox 800 204 898 238; x_wconf 86' lang='eng' dir='ltr'>work</span> <span class='ocrx_word' id='word_1_45' title='bbox 908 204 964 238; x_wconf 90' lang='eng' dir='ltr'>for</span> <span class='ocrx_word' id='word_1_46' title='bbox 974 204 1128 238; x_wconf 90' lang='eng' dir='ltr'>variable</span> <span class='ocrx_word' id='word_1_47' title='bbox 1140 216 1276 238; x_wconf 95' lang='eng' dir='ltr'>names.</span> | |
</span> | |
</p> | |
</div> | |
</div> | |
</body> | |
</html> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment