Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Tesseract OCR on AWS Lambda with Python

References:

https://github.com/tesseract-ocr/tesseract/wiki/Compiling

http://stackoverflow.com/questions/33588262/tesseract-ocr-on-aws-lambda-via-virtualenv

https://github.com/sirfz/tesserocr

Instructions for running Tesseract OCR on AWS Lambda with Python

  1. Launch an Amazon Linux AMI instance

  2. Connect to the instance and generate an AWS Lambda Package

# system libs
sudo yum -y update
sudo yum -y upgrade
sudo yum -y groupinstall "Development Tools"

# tesseract / leptonica / pillow dependencies
sudo yum -y install gcc gcc-c++ make autoconf aclocal automake libtool \
                    libjpeg-devel libpng-devel libtiff-devel zlib-devel \
                    libzip-devel freetype-devel lcms2-devel libwebp-devel \
                    tcl-devel tk-devel

# install leptonica
cd ~
mkdir leptonica
cd leptonica
wget http://www.leptonica.org/source/leptonica-1.74.1.tar.gz
tar -zxvf leptonica-*.tar.gz
cd leptonica-*
./configure
make
sudo make install

# install tesseract
cd ~
git clone --branch 4.00.00alpha https://github.com/tesseract-ocr/tesseract.git
cd tesseract
./autogen.sh
./configure --enable-debug
LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make
sudo make install

# create a python virtual env
virtualenv ~/tfenv
source ~/tfenv/bin/activate

# Install pillow
pip install pillow

# Install cython
pip install cython

# Install tesserocr
pip install tesserocr

# prepare the zip package
cd ~
mkdir lambda-tesseract
cd lambda-tesseract
cp /usr/local/bin/tesseract .
mkdir lib
cp /usr/local/lib/libtesseract.so.4 lib/
cp /usr/local/lib/liblept.so.5 lib/
cp /lib64/librt.so.1 lib/
cp /lib64/libz.so.1 lib/
cp /usr/lib64/libpng12.so.0 lib/
cp /usr/lib64/libjpeg.so.62 lib/
cp /usr/lib64/libtiff.so.5 lib/
cp /lib64/libpthread.so.0 lib/
cp /usr/lib64/libstdc++.so.6 lib/
cp /lib64/libm.so.6 lib/
cp /lib64/libgcc_s.so.1 lib/
cp /lib64/libc.so.6 lib/
cp /lib64/ld-linux-x86-64.so.2 lib/
cp /usr/lib64/libjbig.so.2.0 lib/
cp -r ~/tesseract/tessdata/ tessdata
cp -r ~/tfenv/lib/python2.7/site-packages/* .
cp -r ~/tfenv/lib64/python2.7/site-packages/* .

mkdir tessdata
wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata -O tessdata/eng.traineddata

# Create lambda_function.py file (see example below)
# lambda_function.py
import tesserocr
import PIL.Image
import io
from base64 import b64decode

def lambda_handler(event, context):
  binary = b64decode(event['image64'])
  image = PIL.Image.open(io.BytesIO(binary))
  text = tesserocr.image_to_text(image)
  return {'text' : text}
# zip the package
cd ~
zip -r lambda-tesseract.zip lambda-tesseract --exclude *.pyc
  1. You may then copy the zip package to your computer and upload it to S3
scp -i key.pem ec2-user@AWS_EC2_INSTANCE_IP:~/lambda-tesseract.zip .
  1. Use the zip url in S3 to configure AWS Lambda.

  2. Create an environment variable with key "TESSDATA_PREFIX" and leave the value empty.

  3. You can test the function with a test.json file like this:

{
"image64": "/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAkGBxITEhUTExMVFhUWGB0XGRYY\nGB4YGRYYGR8dIxgeFxgYHSggGxslIiAZITEhJisrLi4xGB8zODMsNygtLisB\nCgoKBQUFDgUFDisZExkrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysr\nKysrKysrKysrKysrKysrKysrK//AABEIAFMCWAMBIgACEQEDEQH/xAAcAAEA\nAgIDAQAAAAAAAAAAAAAABwgFBgECBAP/xABPEAABAwIEAwUFAwgFBw0BAAAB\nAgMRAAQFEiExBgdBEyJRYXEIFDKBkSNCUhUzVGJykpOhFySisdE1U3Sz0+Hw\nGCVjZXOCg5SywcPS4xb/xAAUAQEAAAAAAAAAAAAAAAAAAAAA/8QAFBEBAAAA\nAAAAAAAAAAAAAAAAAP/aAAwDAQACEQMRAD8AnGlKUClKUClfC8vG2klbi0oS\nN1KISPqdK+rawRI1B2Pj6eVB2pSlApSlApXRbgAJJgDUk6ACtfv+O8NZWW3L\nxlKxEjNMSJG0jag2OlY7CMctrlGe3eQ6kzqkzsYOm41rIA0HNKUoFKV0dcCQ\nVKIAG5JgD1JoO9K1nGePsNtkqLt21KdChCg45PTuIkj5wK8eD80cKuFZU3Ib\nVpo8C1M+BXAJG2/Sg3KlcJM1zQKUpQKUpQKV8rm4Q2hS3FJQhIlSlEJSkDck\nnQAeNYhvjHDlEJTfWhUTASH2ySTsAArUmgzlK4Sqa5oFKV1WuKDtStRRzNwg\nqUn31vuxJIUAZ/CcsK+Vd/6ScJ/Tmfqf8KDa6Vq6OYmFGSL5jQTquNPIHc+Q\n1rp/SThP6cz9T/hQbXSvHhOJs3LSXmFhxtU5VjYwSDHoQR8q9lApSlApSsBx\nNxlZWCQbl4IJnKgAqWqPBKdY8zp50GfpUS4vzzskz2LLzpKCQSA2M/3QZk5T\nqSoT00MmOmCc8Gn7hhgWbiS86hvMXQQkrUEzGXUCaCXaUpQKVpWMc0sMtnVs\nuOr7RtRSsBpZykCd4g+Gk712seaWEuJze9pR0KXEqQoH0I89xIoNzpUfX3OP\nCW1lHauOR95tsqSfQmJr4f024T4v/wAL/fQSRSo7Z5z4SoiXHkzOpaVGn7Mn\nyrceG8cavbdFyzm7NzNlzCD3VFJkdNUmgydKUoFKxHEnEtrYtdrcuBCdgN1r\nPghA1UfTbc6VFmI8/mwVBqyWqFEArdCQU66wEmDtprQTVSoFZ5+OZhmskZZ1\nh0zl6xKYnaPTz0lvhHi62xBrtbdSjBhSVCFIVAMEdd9xI86DP0pSgUpXxvLt\nDSC44pKEJEqWohKUjxJOgFB9qVDfE/PNltSkWbBeIMdqs5Wz5pSO8obfhrCW\nPP18K+2s2lJ0/NrUgjx+LMD6aetBP9K03gnmNZ4iAlBLb0E9iv4iBuW1bLHp\nqI1ArcQaDmlKUClKjnmNzTt7BPZsFD9zI7gMoQNCe0KToY2A+dBI1KrY5zzx\nMkwi2AnQdmowPCc+tbnwBzmTcOBm+ShpSiAh1OjZMbOZichJ2MxrBiNQmCld\nUKBEjau1ApSlApSlBFvG3OFuxuXbZNst1bUAqKwhMqAOkAkiDv49OtR1inOj\nFHSS2pthPQIbCiJGgKnM0nroBUs4xypsbq9cu3y6vtCCpvPlRISEj4QF9PxC\ns/gnBeH2sdhaMoVEZsudcftrlX8/DwoKmYpidxcr7R95x1XRTi1Kj0mYE9Bp\nVteBFTh1mcoT/V2u6JAHcToJ1qDfaH/ymj/RUf6xyp14JtS1h9o2dSlhsEyT\nJyiYmgzVKUoFa/xpxYxh1uX3pOuVCB8Ti40A8PM9K2CoP9pRlZFkoIVkSXgV\nxKQpXZwD4GEk+cHwNBofGnHV5iy0tkZWysdnbo1lRgCTErUTt57CsjgfJnFH\ntXA0wJiHFyrbcJbnrpqQa37kDwwlFsq7dYKXlLUltapB7GEQUpOglQX3okx4\nVLgSPCgqnxNy8xLDVh4oKkNkLD7JKggpMgnTMgiAZIjzrb+XfNu594QxfOdq\n24oIS4QlKm1E6E5QMySSkGdRU9uIBBBAMiIPUeB8qrRzm4MZw+4aUwCll9Kj\nlJkJcSrvBJjRMKRA1jWgsyDXNalyqxY3OF2zijKwjs1EnMZaJRJPiQkK1/FW\n20GN4jxFVvbPPpbLhabUsIBgqygmJ/49DVb+IONsUxgptkoMHXsWEqGbYSvv\nEkDzgCfSrPvNhQKSAQdCCJBB3BB3FYnhvhe0sUqTaspbCzmVBKiT5lRJjwGw\n1oK/YPyYxRxaA6lDCD8SlLSspHXuIMk76aDTenFnKC+s2S8lSLhKZzBuQpCY\n+LKdx0IEnUb1ZmKGgrhyr5lvWrzdtcuKct3FpTmWpRLEgJTlKjo3tKdgASI1\nmx6TVX+d2BsWuJZWEBtDjSXSkfDnUpQUR4DQGPOrA8AXrj2HWjrhlamUFRiJ\nMbx57+HXSg2ClKUCuCa5rTuaHGH5OsytBSX1nI0lXj95UDcJGvhMDrQaFz+4\ntVphrUQpKVvEg/iSWgkztoSdD0qGrBlSLlpKwQoOIkHQ6kEaHyqR+U3CS8Su\nVXt4S402vv8AaDP7w4Qe6So/CnukyDuBWsY2rtsdWMpGa+yQnvHR3L3ZgTpo\nNulBbOKV1Qa7UCvLijHaNOI/GhSfGMwI2677V6q8WNOZbd5UTlbWcp2MJOho\nKicH8POX90i2bUlKlBRzK+FISCST/d863t7kViOmVy1MiT31iD4fmzp5+dYH\nks4oYxbQYB7QK8x2a9PPWPpVqxQVwa5F4nIBdtgJ1PaLMDxjs9fSvtdciMQE\nZH7ZfjJWmP7Bn/dViaUGtcucCdscPZtXiguN55KCSnvuLUIJAOyh0rZaUoFc\nE1zUS8/eKHLe3btWyAbnOHNJPZpyiBrpJO8dDQYLmxzRuE3CrSxcLaWiQ48k\npJWqNUoOuUJ1BO8jpGul8JctsQxAhxKezZWCrtnSQFeYHxKk9YjzrcORnAiH\nx7/coCkpVDCFDulSfiWR1AOgB6gnpU8oQAAAIjYeHpQQ5g/IRgJ/rVy4tcjR\noBCPMHMFE+oy1mLDkpYNPofQ9cgtuJcQnMiAUkKAJyZiJG8gx9ak2lBwBXNK\nUFTeYrZXjN0gGM1xl9Jgf8elSDccgl65L4EaQFsnymSF+saV7OIOUFzcYg7d\ni4ZSlbwcAIUVASPKJ0qZBQQpY8gU5Ptr05j0bbGUbdVGT6wPSvo7yBZ0y3rm\n+stpMiDtroZj5TU0V5cVvEssuurMJbQpZMgaJBO506UFSeNcATYXSrVL4eKE\npKlBOQJUROUiTqBBn9boRVk+VGGuW+E2rTohYSpRHh2i1LAPgQFCR0M1XDhb\nC1X+JtNwtxLr2dwq1UW82ZZWR1ImT4mrctpgRAHkKDvXixjEm7Zlb7pyttpK\n1HfQeXU9APE17a1bmeB+SryQCOxVuY16GfEGCB1260Fb7t65xfEVFOda33IS\nDJDTalQkEDZCAdfTzqY8I5F2CEDt3HnlwJIUG0g9coSJj1JrRfZ4cP5SdSIh\nVusnQdFojXeNasenYUEXYryQw9TSgwXWnI7qyvOmemZJHw+Ma1CuEXr2FYkl\nxaDnYcIWjVOZOoVG0gjUE6HQ1bs1X/2jGmk3dspKYcU0orVA7wCoR6kd/fyo\nJ8tXwtCVp1SoAg+IIkfyr61qXKjN+SbPMtSj2WhMaCTlSIGwEAeQrbaBVdOe\n3F5ubgWTRUG7ZR7SdAt308EiQD5nyqxDjgSCSYAEn0G9VS4PKbzGmVOiQ7dF\nxQ6HvFYGvSQBB6UEh8A8mG1sh7EQvMoApYSrLkHTtCNcx10B0B8dtqv+TOEr\nQUoZW0rotDq1EfJxSkn6VIQrmgqZxvwHdYa6e0BWxplfSkhBnYH8KpB0nzFT\nnyi44GIW/ZuSLhgJSsk/nAZAWIAAkgyOn0raeLMGRd2b9uoA9oggT0Xug/JQ\nSflVbuUV+7b4swhMAuKLLiT+FWqhp1BSk+ooLUUpSg0fnDxD7nhzmUqDr32L\nZSNirVRJOwyhVQjyp4G/Kb6g4oi3ZAUvLupSicqASIE6knXQecjOe0Ti613r\nVtshlrOB4rcJkn5BIHz8alXlDw+mzw1kCCp9IuFqGklxIKQPJKco9QT1oMl/\n/C4Zlye4WsRH5lGb9+M0+c1CPNDleqwCrm3VmtJAIUe+1mMAE/eRMAHfUTO9\nWRrG8R4ULq1ft1bOtqRPgVDQ6g7GD8qCO+RvGouGRYuk9swjuaGFMJyga7Zk\nyBHgRHlK1VT5ZYkbLF2A4Snvm3cA1+KUwY0Iz5T8pq1SDpQdqUpQKUpQKUpQ\nVw9oj/KaP9FR/rHKsBw+f6rb/wDYt/8ApFQp7SVqlL9o6IzLbcQdNw2pJTJ6\n/GalHle2pOE2QUZPYpO891Wqf7JA8tqDaaUpQK4iuaUHEVzSlANQP7SGIJLt\noyFAlCVuEfeGcpCZ9cp+lStxrxYzh9up50gq1Dbcwp1X4U7/ADMafSa/cNYS\n9juKqceScpUHHyJAQ0CAlCSB8UQgdSEz0JoJn5H2iW8HtyE5VOFxaz1Ue0UE\nqP8A3EoE+AFb5XxtLZDaEttpCUJASlKRASkbAAbAV9qBSlKDia+N7dttNrcc\nWEIQkqUomAkDck1FvPLjC9sFWotHg32gcK+4hc5ckfnEmNzt41CfEHF9/fQL\nm4W6AZCICUTtORACZ84mgy3MjHVYjia1tQpGYMMxstKSQkgkD4iSdds0dKsx\nwjh6reytmF/G0y2hWs95KQFa+tQLyPfw1t9S7xaUPpILBdMNjfMUkmO02ifl\nrNWPRHSg7UpXBNB8MQu0MtrdcUEobSVqUdglIkn6VWbHb57HsWSlrMELORoK\nkhttO61JExOqj5wJrceePHx72HW6o/SFDwI/NajzBJHkPGti5NcAKsWzc3AI\nuHkZcsn7JskHKRMFSoSTO2UAdZDfMCwlu0t2mGkgJbSEiBEkbk+ZMk+pqq/E\nDQRjLqUBaQm8MZiSr85vJ1M7gnWCKtyaqNiCe0xpY1VnviBESQXiBE6ek6UF\nuQK5rgGuaBXyuUJKFBQBSQQQdiCNZ8q+teTFlqDDpQJUG1lIiZUEmBHX0oKp\ncrGyrFrMJmQ6FaGNEglX8gdOu1WKxXmRhdsoocu2ypJylKJcKTrIPZgxER5V\nWbhvhW8vXMlvbrX4mMqE/tLVoP7/AFqUML5CLISbm7CTrKGkZo00hxRE/u0G\n3v8AOvCUmAt1Q/Elox/ag/yrb+HuKLO9TmtrhtzrlBhY/aQYUPpURX/IBWvY\n3o6aONEdNe8lRnXyqN8Vwu9wm8CVZmnWzmbdTspIPxIPVJ6g+JBFBb6lapy4\n4wRiVoHdA8iEPIGwXG6RJ7itx8x0ra6Dgmqt8ycS/KGMrQFdwOJtmzGgCVZV\nGBqe+VHzqz167kQpcTlSVR6Amqm8u7xhGIJurtwZWQu4ObvKdcQCUJTm0LhV\nBEkajeYoLPcL4UixtGrfOCllEFZASCd1KPQak9frXgHMbCc2X35mZjcx+9ER\n57VA2PYliONXa1MtPKaUrs0NJns0pTKkhxU5M0d4kn00ArIK5H4pGabaYnL2\nipmNtURPzjzoLFWGIMvoDjLqHEHZSFBST80mK9NVCxPDMRwp4JWHrZaphbay\nkLSN8q2zBGvj97UCp65R8di/t+ycM3LISFkkS6CSAtIGvhm03I8aCQ6UpQKU\npQKjbnvjqrfDi02sBdwsNkdS3BLkeEwBP6xqSFGqs818fVfYo6EAlLR92bSB\nJVkJzRB7xUsqiNxloNy9nDCjmurpSdgllCuhJlTgHpDf1qdawHAmC+52Fvbk\nALQjvwSQXFSpZk6/ETWfoFebELVLra21CUrSpJ66KEHQ16a6rVFBVLA7+4wL\nE1ZwkqaJadTEhbaoJyEwQSAlQOnSdJqx2Cca4fcpBZumiYBKVKCVpkD4kqgj\ncD1qAubvGLGIPBNu0gobIAuMpDq95GuvZaiARMjSBWJseW+LPNpcRZuFChmS\nSpCZB65VqBE6dKCwvFvMSwsUAuOhxZ1S21C1kTqd8oG+pI2quuI3dzi+IjVS\n1POBKUgFQZbJ0CU9EoGp22JO816neV2MJSSbJcAToptR+QC5J8hWb5V8wkYa\nfd7hlIaUs53QmHkE6HON1pEARuI67UFhcBwxu2t27doHI0kITJkwOpPid/nX\nvr42j6VoStBlKgFJI2KSJBHqK+1BwRVSb95eGYs4toZV29wvKCSQUEmEkq1I\nUhUE+B+dW3qKecHLp2/U0/aISXhKHAVBGZESgydyDI9FDwoJHwXFmbllt9lY\nW2tOYEfzB8CDII6EV7pqpWC8TYlhLy2kKU0Qr7RlxIUgqTpqk9YAGZJBIA1i\ntjxDnfii0kITbtSIlLZUQfEdopQ+oigmDmXxu3htsSFf1hxKgyiCRIgFStNE\npkHXfb0r7y0cKsYtFEyS9JPiTJNZTgvha8xl4uvuLLCVS684TqJlSGp0Cjrt\non6CsbwYlKMbYFt9o2LvK2SfibzEAk6a5NfOgtnSuEmuaCpvMFarvGblKNVL\nuOwSDCe8jK2PlIAmrUYZZpZZaaSIS22lsCSYCAABJ1O25qq3Gea2xm4W33lN\n3ZeTI0KirOAQNwCY+VWutXCpKVKSUkgEpO6SRqPlQfWuq9q7V1XtQVHDQTje\nUAgJv4AO4Ae2PnVuRVS+ZTCWMXuU26SjK6FJAJJCyEqJT1HeJIHSQOlW0FBz\nSlKBSlKBSlKCu/tE3KziDLZJKE26VJT0ClrWFEeZCU/u1MXLNKxhVlnMnsEE\nfskdz+zlqHfaHZjEGlZYzsAFX4oUrcdI018/Kpl5dXaXcMtFoKI7FAIR8KVJ\nSAU7nUEQR4ig2OlKUClKUCtI5kcwWcNbKdF3Kky215HZS42RofMxHnXk5o8x\n28PR2LWRy6XsiZDQ/E4Bv5J0n03hThLhq5xm8UVuEyQt95WpSDP9oxCRsI6A\nCg7YVhOI47dlZKlSe+6qeyZBkhIHTyQPn1NWU4Y4fZsrdDDKYCQAVdVqAEqU\nepJk/M124d4ft7JlLNu2EIG/4ln8S1feV5n5QNK64/xJa2eU3L6GgucuYmVE\nbwACYGk+ooMvSsfgmMsXbfa27qHW5jMgzBG4I3SdRofEVkKBSlKDw4lg1tcQ\nH2GngDIDqEuAE7kBYMH/AArzN8L2KUKQm0t0oXopKWUJCh+sAnWsvSgrlzl5\ndt2Kk3VqlQt1HKpEyGV/dykmcqtd9iInvADe+QfESrixWwtSSq2WEpEQQ0oS\nifGDnE+QrYebmT8k3naRHZiJE9/Mns9uubL84qH/AGeE/wDOLhlWjCtAND3k\nRmPTy86CyNYPjTHhY2bt1kz9llhMxJWpKEyY0EqE+U1nKxvEWEIu7Z22WSlL\nqCgkAEidiJBEigqdgWPhq/TevtB+HVOqQTAWpU6zESFHMNPu1LA9oBv9AX/G\nH+zrLo5E4flAU/dkjchTYn5FowP8dzS65FYerLkeuURAV3kKzRudUaKOm2gj\nagw7nP8ATBiwVPSXhE+f2dRI3jsYgL5Tcn3g3HZ5oGbOVhOaJgGOnSpvPIew\nlJFzdR94S3r+yez7o9QahP8AIyDinuQUrJ737vm0Ksva5J2jNGu0UEmp5+u/\noCNf+mP/ANKmXhzGEXds1ct/C6gKAmcp+8knxSoFJ9K0C75G4av4HLluI0C0\nqEaT8SCddeu5+Vb/AMOYI1ZW7ds1mKGxAKjKjJJJJAAkkk6ADXQCgyVcEVzS\ng4SmK5pSgVpfNjhn36wcSmA419s2T1KAZTJ2zAkesVula/x7iYt8Pu3ZAIZW\nEk/jUkhAjrKiKCE/Z3uVDEHWkqIQtgqUPxKQpOU/LMr61Yyq3+zw2TiTigNE\n26wfKVNx9dashQfC8bzoUiYzJKZ8JEVTrA8BdfvEWYAS6tzszMQggnMT4hIB\nOm8VcsiqtcYtqwzHFuIA+zfTcISklIKVELyabCCUn1NBY3hfhu3sWEsW6SEC\nSZMlSjupR6nb6VmAK+GH3SXWm3U/CtCVjWdFAEajevRQYLjHhhnELZdu7pIl\nKwAVNrGykz8wY3BI61Wvgp52yxlhpChmTdC2WqNFJU4EL0PQiSOo0q17iwkE\nnYCT6CqnYG4H8cZdbBIcv0upEa5C8FSR5J1PhB8KC2dKUoFKUoMZxJi6bS1e\nuVCQ0gryzGYjZM+JMD51WHgTB/fsWaSrMEqdLy8p1SEysgHfcZZ31qWfaC4h\n7GzTap+O5V3vJpsgn5lWUegVXj9nzhpKWVX64K3CptvrkQkws7aKUfPYUExI\nrtSlArQ+dGOuWmGqU0vI44tLSVA5VAGSooI1BgHUbTW+VFvtDMFWHNKCSQh9\nJJBEJBStMmR1JA6akUGiciuDk3Nwq7dBLduoZExKVuwTqf1O6qB1KddKsYna\noj9nC5R7nctT30v5ynqErQgJPzKVfSpdoOCKgj2gOFENKRiDSSFOLCHojLmj\nuK01BMEE9dOu88VD3tF4okWrNsFDMt3OpIUM2RCVRmTvGYgz+r9AynIPGHHs\nN7NQ0t3C2lZM5ge8BqZGUKA8IjwqTaiz2erNSMNWo7OXClJ9AEJ+eoO1SnQK\nUpQYzGOH7W6CRcsNuhJkZ0hUT4E6isJh/LLCGVZkWTZJ6OFTo+SXVKA+lbdX\nnxC9bZbU66tKEJ1UpaglKR5k0Gnc2MdFhhjvZgJU5DDYA0SXAZ0G0ICyPMCo\no9n3CQ7iCnlJBTbtkgmdFr7qCBsdO039a8nNjjEYndttW4UppolDZEntlry9\n4IyzOmUDUmPOKmvlZwz7jYNtrbCH1St3YkqJOUKUN4TAiTGupoNwpSlBVbnR\narbxa4Kk5Q5kcTEwpJQkFQnxUFA+YNWV4YvC9aWzpyy4w2s5dpUkExqdJqHv\naOwpRVa3QR3cqmVr8DOZsHy1XHzrauQ2O9vh/YKUCu1UW95JbV3myfAfEj/w\nxQSXXVe1dq0vm5jy7TDHnGyA4spaSSJ+MwrTb4c0E9aCBMAR+U8bbLvfS/cF\nas+mZtMqynLt3UxpVrUVXX2ebILxBx0x9kwco81qCZBnoMw+dWMoFKUoFKUo\nPNc37Tcdo4hE7ZlhM+kmvgcdtf0lj+Kj/GoI5l8EYre4ncOt2qlo7oQoLQEl\nAAAgqKddCSNwSdxBOFRyZxcpCi02Jjul1MifGNNPWgzvtDYghy5tQhaVBLKj\nCSDGdWhkeIH8qk/ltiNs1hdmhVwyCGUkguJBBV3iCJ3Ex8qhf+hXF/wM/wAU\nV2VyUxWAQGCTuA7qn1JSAfkTQWL/AC7a/pLH8VH+Neq1ukODMhSVJ/EkhQPj\nqKrR/Qpi/wCBn+KKmzlRgLtlh6GHkhLoWtSgCFAyo5TI/Vy0G41F/N7mUbAe\n7W4/rKkhXaaFLKTMGCCFLMHQ6Aa1KFQpzX5b39/fm4tw2Wy2lOq8pBTMyCPO\ngjrhLDkYjeLfv7xtluQt5x1SUlxSj8Ce8nVQSvvD4QPMTP2BY7gto0lm3vLN\nDaegeRJPUqOaVKPiahBHJvFy5kLTaRE9oXU5PTSVT8q9p5GYn0ctTp0cX9NW\n/wDdQT1bcXYe4YbvLZZGpCXkGB+9UW+0Didq8wwlu5Qt5twns0FK5QpOqlKG\nqQIGkwZ1BgRrR5FYn/nbT99z/ZV8LPkliiwSTbohRELcPeAiFAoSrQ67wdNq\nCRPZz/yc/wD6Wv8A1bNSpWm8q+FHcNs1MvKQpxbqnTkJKRISkAFQBOiQdutb\nlQYPjXiAWFm7dFBX2cQkECVKISmSdhJEnwnetT4a5yYe+lIfUbd3KMwWCW83\nglYH94H8q3rGcKZumVMPoDja/iSSQD4aggggwQRqCKh/inkVOZdlcR1DT2vy\nS4kaddx4SdzQTLb37SxKHELExKVBQ+oNYrG+MbC0JFxctIUBOTNmcg7Q2mVa\n+lQAOSuLfgZ/iivpb8k8VKgFBlIO6i5IHySCaDnmrzLViCgxb50Wo3CtC8ZB\nBUOgBAgT5nwEockOGPdbHtiTnuiHCCnKUpTIQnXWdyfXymvhwdyatLVQduF+\n9OJIKQU5G0kfqSc5/a08qk4Cg5pSlArz4hetstqdcUEoQCpSjskDcmvRWG4w\nwT32yftc+QuogK10UCCmY3EgSOokUGCc5q4Rkze9p2nKErzekZfi8qr69irB\nxkXSCQx74l6SDOTtApRgknxMVIFtyCezjtL1vJ1ytkmPIEgT6/z2r3XfIFBV\n9nfKSn9dkLM6zqlxIjbpQbb/AEvYOFZTcneJ7JyPCZy7edbrh96h5tLragpt\naQpKhspJEgioWPs/q6YgPnbkf/N/xFTDw/hYtbZm3CswabS3miJygCYkxO/z\noMhUa84eNbnDBaqtw2e1U5nC0kyEZIAggj4j/KpKrCcScKWl8gIumQ4EzlMl\nKkFUTlUkgjZPrFBHWD8+bZUC4tnWtu8hQcTtqSDlUBPhNbza8wcLcSFC+twD\n+NwII9Urg1p+L8i7Jclh51nSADDiZHXWFeomteueQD0wi+bI/WaKDPoFq/vo\nJIxrmbhdu2Vm6bdOwQyoOqUYmISdB0kwKgjjfje6xZ9LaULS1mCWrdBkqUTu\nsAd9ZPyHTqTtbXIC4kTetATqQ2omPISJ+oqTuDOXtnhyUltAceG77gBX55Oi\nE+Q6bk70GP5S8D/k63Up2DcuwXI1CAPhQD1gySRoT5AVv1KUCoq55cELu2UX\nNugrfa0UlI7y2jrp4lJkgeClVKtcKFBWnlhzRVhw93eQp22kkFJ77U75QdFJ\nJ1y6RJM9DMqeaOEZZ98RtMQufSMu9fDiflZh14CezDDsz2jICSZ3zJjKqfSf\nA7zpH/J9/wCsB/5b/wDagxnM3m6m7ZNrZJWhtwQ6tYyrUPwJAJhJ6nczGms7\nDyK4JWwFX1wkpW4nKygiClswVLI3BVsNoE/ir1cN8jrVhxK7l43ISZDeTs2z\ntGcBSiqNdJg6SImZWQmBQdqUpQKUrq4mRG1BWPnhivvGKrQlJHYISx4lSgVK\nkAea4HpViuFsP93s7dnKEltpCSBtmCRm/nJnzrVMK5S2DT3vDinrh3tO1CnV\nzrMjMEgZtdZPhW/AUHNKUoFYziPBW7y2ctnfgcTBI3Sd0qTP3kmCPSsnSgqr\ncMX2AX6VGdDKSCQ2+jYyOu57p1SqD4TJ+G8+LJSR21u+2cuuXKtObwBkEjzI\nFSbjOEMXTSmX20uNqEFJ/wDYjVJ21BB0qP7rkbhilZkrum9B3UOJI0699tRn\n50GK4i57sJbIs7dxbhGingEoSfNKVFSvTTfeonssPxDGbtbiQp55ZzLWYCED\nQCSdEpGgCRrA0Bqb7HkhhaB3zcO7fG4ABHgG0p385rfcIwhi2bDVu0hpsfdQ\nIE+J6k+Zk0HHD+HJtrdphAAS2gJESRoNYnXUyfnWQpSg+F9dBptbivhQlSz6\nJEmone5+Wkdy0uFHTRRQkROuoJ6T0+m9S2+0FApOxBH1qLv6BcN/z95++1/s\naDDv+0AiSBYKIkxL4E+EgNmPST86j3iTjDEMXWGlSUlct27Q7oJgCQO8s+aj\nAkxE1NFjyVwpEZ0POwI+0dIB13PZBGvTTTStxwLhy1swRbMNtTAJQIKo2zHc\n/M9aCLuVHKp22fTdXwSFIEtNBQVCiFZi5pEpEQATrr0FTOKUoFKUoMZxHhDd\n3buW7k5HElJgwR1BBIOoIB2O2x2qsWNYNiGB3IUlakKP5t5HwLHXeQfNCh1G\nmoq2FeLFMLZuEhDzTbiQoKCVpChmGxhQImggy35+3ASkLs2ioDvFK1JBPkkg\nx9TWoY3jt/jl0huCok/ZMI+BsGJJ+WpWo/QaVPdzyswhasyrJAMAd1biE6fq\noWEn6VnsH4btLUk29uy0TuUICVEeGYCY0GlBgeV/BAwy2KVKSt5w5nVpGkjQ\nISTqUJ1ieqlHSYrdKUoFKUoFKUoOMormlKBSlKBSKUoFKUoOIpFKUCKRSlBz\nSlKBSlKBFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUo\nFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUo\nFKUoP//Z\n"
}

It should print 1234567890.

@jiminygreen

This comment has been minimized.

Copy link

@jiminygreen jiminygreen commented Nov 18, 2017

the version http://www.leptonica.org/source/leptonica-1.74.1.tar.gz is no longer available. Is there another version you could suggest.

I've tried http://www.leptonica.org/source/leptonica-1.74.2.tar.gz but i still get a compilation failures on 'pip install tesserocr'

@amitm02

This comment has been minimized.

Copy link

@amitm02 amitm02 commented Nov 22, 2017

it seems that autoconf-archive is needed now: link
But yum can not find it: "No package autoconf-archive available." I haven't manage to compile it manually yet..

@jiminygreen

This comment has been minimized.

Copy link

@jiminygreen jiminygreen commented Dec 2, 2017

I'm new to Python. What autoconf-archive, and what is it doing?

@Ango

This comment has been minimized.

@sanudatta11

This comment has been minimized.

Copy link

@sanudatta11 sanudatta11 commented Dec 7, 2017

Thanks Ango, you saved my day. I was getting constant error of lib file missing. Anyone can shed any light on the same.

@eyaler

This comment has been minimized.

Copy link

@eyaler eyaler commented Dec 13, 2017

checking for leptonica... configure: error: leptonica not found

@ChiragG

This comment has been minimized.

Copy link

@ChiragG ChiragG commented Feb 21, 2018

Has anyone been able to compile the latest tesseract version on the aws lambda linux environment?

@akhilkatpally

This comment has been minimized.

Copy link

@akhilkatpally akhilkatpally commented Apr 4, 2018

These are the only two things which needs to be changed rest all are okay.
Install autoconf-archive from:
http://rpm.pbone.net/index.php3/stat/4/idpl/23652016/dir/centos_6/com/autoconf-archive-2012.04.07-7.3.noarch.rpm.html
Download it manually and copy it into the ec2 instance.
sudo rpm -ivh autoconf-archive-2012.04.07-7.3.noarch.rpm
Update the leptonical version:
wget http://www.leptonica.org/source/leptonica-1.74.4.tar.gz

@bushang

This comment has been minimized.

Copy link

@bushang bushang commented May 14, 2018

"pip install tesserocr" step failed. Any thoughts?

And I tried the lambda-tesseract.zip, there is no tesserocr.

@JeremieThomasBernard

This comment has been minimized.

Copy link

@JeremieThomasBernard JeremieThomasBernard commented Aug 10, 2018

Hi all,
Got Ango's zip : https://s3.amazonaws.com/ca-lambda-tesseract/lambda-tesseract.zip
Try to run it but got :
Unable to import module 'lambda_function': dynamic module does not define module export function (PyInit__imaging)
Any idea ?

@mapavia

This comment has been minimized.

Copy link

@mapavia mapavia commented Aug 16, 2018

You might be using python3 instead of 2

@JeremieThomasBernard

This comment has been minimized.

Copy link

@JeremieThomasBernard JeremieThomasBernard commented Aug 16, 2018

It seems that it was just that... Thanks Mapavia !

@rheft

This comment has been minimized.

Copy link

@rheft rheft commented Sep 17, 2018

Any ideas as how to include poppler in this installation? Need to convert incoming pdf's to images using the pdf2image library which relies on poppler.

@Suryaphaneeth

This comment has been minimized.

Copy link

@Suryaphaneeth Suryaphaneeth commented Sep 19, 2018

@rheft, even I am having same issues. Please post an update if you could find a solution. Will update the post if I have any progress.

@mpetryszyn

This comment has been minimized.

Copy link

@mpetryszyn mpetryszyn commented Nov 11, 2018

Struggling to get tesseract 4.0.0 to run on python 3.6 in Lambda. Anyone have any luck?

@luizgustavogp

This comment has been minimized.

Copy link

@luizgustavogp luizgustavogp commented Nov 20, 2018

Hi,

I configured the code entry with this s3 link URL https://s3.amazonaws.com/ca-lambda-tesseract/lambda-tesseract.zip, provided by @Ango but I received this error:

{ "stackTrace": [ [ "/var/task/lambda_function.py", 15, "lambda_handler", "text = pytesseract.image_to_string(image)" ], [ "/var/task/pytesseract/pytesseract.py", 125, "image_to_string", "raise TesseractError(status, errors)" ] ], "errorType": "TesseractError", "errorMessage": "(1, u'Error opening data file /usr/local/share/tessdata/eng.traineddata')" }

Any idea?

@johnykifle

This comment has been minimized.

Copy link

@johnykifle johnykifle commented Nov 28, 2018

Hello guys,

Has any one experienced this error message.

{
"errorMessage": "Unable to import module 'lambda_function'"
}

I have had this issue previously when i zip the 'lambda-tesseract' folder. And resolved it by zipping the content instead of the folder.
But now it doesn't work at all.

@gwittchen

This comment has been minimized.

Copy link

@gwittchen gwittchen commented Nov 28, 2018

Did you download the correct traindata since each version has it's own dataset.
Older version < 4 from tesseract require additional data in the tessdata folder such as "".cube.ln files
Check that TESSDATA_PREFIX is correctly set.

@johnykifle

This comment has been minimized.

Copy link

@johnykifle johnykifle commented Nov 28, 2018

I am actually using leptonica-1.73.tar.gz and tesseract-3.04.01.

@gwittchen

This comment has been minimized.

Copy link

@gwittchen gwittchen commented Nov 29, 2018

I created a newer version using a docker container to build the binaries for the AWS lambda environment.
It's using leptonica-1.76 and tesseract-4.0.0 and the same simple lambda from this tutorial.

https://github.com/gwittchen/lambda-ocr

@pog8tor

This comment has been minimized.

Copy link

@pog8tor pog8tor commented Jan 4, 2019

I created a newer version using a docker container to build the binaries for the AWS lambda environment.
It's using leptonica-1.76 and tesseract-4.0.0 and the same simple lambda from this tutorial.

https://github.com/gwittchen/lambda-ocr

This is pure gold... i went back, just to say thanks
Python3.6 + tesseract 4 over AWS lambda

@seth10

This comment has been minimized.

Copy link

@seth10 seth10 commented May 20, 2020

Ready to use https://s3.amazonaws.com/ca-lambda-tesseract/lambda-tesseract.zip

Thank you very much @Ango for preparing this! All I had to do to get it running was add os.environ["TESSDATA_PREFIX"] = os.path.join(LAMBDA_TASK_ROOT, 'tessdata'). Also I had to make sure I selected Python 2.7 as the runtime instead of Python 3.8.

@juang97

This comment has been minimized.

Copy link

@juang97 juang97 commented Jun 4, 2020

@seth10 where exactly did you add this line? Was it in the .py file ? Thanks

@seth10

This comment has been minimized.

Copy link

@seth10 seth10 commented Jun 4, 2020

@juang97 Exactly, in lambda_function.py I added it on line 10, right below the line that appends to os.environ["PATH"].

To be clear, this is what I did and worked for me in Python 2.7. The link gwittchen shared seems promising and updated. Being able to use Python 3.6 would be great, but I only noticed this after I started using Ango's zip. I haven't tried it myself.

@Gautami007

This comment has been minimized.

Copy link

@Gautami007 Gautami007 commented Aug 6, 2020

Hello @seth10, @juang97, @Ango @johnykifle

I have followed all the steps mentioned above and created "lambda-tesseract.zip" file and uploaded to lambda through S3. But, I am getting below error white Test function;

{
"errorMessage": "Unable to import module 'lambda_function'"
}

I am using all below configuration;
tesseract 3.05.00
leptonica-1.74.4
libjpeg 6b (libjpeg-turbo 1.2.90) : libpng 1.5.13 : libtiff 4.0.3 : zlib 1.2.7 : libwebp 0.3.0
python 3.7

Also, in lambda "Environment variables" value field is "must" parameter, we can not keep it empty.

Could anyone of you help me out with my error?

@seth10

This comment has been minimized.

Copy link

@seth10 seth10 commented Aug 6, 2020

Hi @Gautami007, to me this sounds like a general AWS Lambda issue, nothing related to tesseract. Please make sure that you haven't changed the name of the primary function you intend Lambda to call. If it's no longer called lambda_function, either change the function name or tell Lambda which function you want it to call using the Handler field, as seen here. Documentation here.

@Gautami007

This comment has been minimized.

Copy link

@Gautami007 Gautami007 commented Aug 6, 2020

Hello @seth10,
Thank you for response.

I neither changed the lambda function name nor handler name. Also I did upload the zip that you have mentioned in comment "Ready to use" but that also not working.

Getting below error;

START RequestId: 37a526a1-af54-4e2c-bacf-82df1ab01ac8 Version: $LATEST
(1, u'Error opening data file /usr/local/share/tessdata/eng.traineddata'): TesseractError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 15, in lambda_handler
text = pytesseract.image_to_string(image)
File "/var/task/pytesseract/pytesseract.py", line 125, in image_to_string
raise TesseractError(status, errors)
TesseractError: (1, u'Error opening data file /usr/local/share/tessdata/eng.traineddata')

END RequestId: 37a526a1-af54-4e2c-bacf-82df1ab01ac8
REPORT RequestId: 37a526a1-af54-4e2c-bacf-82df1ab01ac8 Duration: 123.13 ms Billed Duration: 200 ms Memory Size: 128 MB Max Memory Used: 52 MB

@seth10

This comment has been minimized.

Copy link

@seth10 seth10 commented Aug 6, 2020

Oh I see, so leaving the handler as lambda_function.lambda_handler should be fine then as that zip contines that file and that function. Given that you're seeing that error ("Error opening data file /usr/local/share/tessdata/eng.traineddata") now, I assume that means you're Lambda is at least starting up.

I feel like I remember seeing this error and that's why I added the os.environ["TESSDATA_PREFIX"] = os.path.join(LAMBDA_TASK_ROOT, 'tessdata') line, so it would know where to look for that file. Could you try modifying the lambda_function.py file and adding that on line 10?

@adriaanbd

This comment has been minimized.

Copy link

@adriaanbd adriaanbd commented Aug 10, 2020

Hey I was very tempted to try this one out but opted for this EC2 + Docker that generates zip files using shell scripts to compile the libraries and dependencies, over here. It's very straightforward and reproducible.

@kollols

This comment has been minimized.

Copy link

@kollols kollols commented Aug 19, 2020

I am getting the below error while using seths solution :

START RequestId: 81a1f718-71a3-4597-8cf4-ea74a3304f02 Version: $LATEST
[Errno 13] Permission denied: OSError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 16, in lambda_handler
text = pytesseract.image_to_string(image)
File "/var/task/pytesseract/pytesseract.py", line 122, in image_to_string
config=config)
File "/var/task/pytesseract/pytesseract.py", line 46, in run_tesseract
proc = subprocess.Popen(command, stderr=subprocess.PIPE)
File "/usr/lib64/python2.7/subprocess.py", line 394, in init
errread, errwrite)
File "/usr/lib64/python2.7/subprocess.py", line 1047, in _execute_child
raise child_exception
OSError: [Errno 13] Permission denied

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.