Skip to content

Instantly share code, notes, and snippets.

@barbolo
Created January 27, 2017 17:31
Show Gist options
  • Save barbolo/e59aa45ec8e425a26ec4da1086acfbc7 to your computer and use it in GitHub Desktop.
Save barbolo/e59aa45ec8e425a26ec4da1086acfbc7 to your computer and use it in GitHub Desktop.
Tesseract OCR on AWS Lambda with Python

References:

https://github.com/tesseract-ocr/tesseract/wiki/Compiling

http://stackoverflow.com/questions/33588262/tesseract-ocr-on-aws-lambda-via-virtualenv

https://github.com/sirfz/tesserocr

Instructions for running Tesseract OCR on AWS Lambda with Python

  1. Launch an Amazon Linux AMI instance

  2. Connect to the instance and generate an AWS Lambda Package

# system libs
sudo yum -y update
sudo yum -y upgrade
sudo yum -y groupinstall "Development Tools"

# tesseract / leptonica / pillow dependencies
sudo yum -y install gcc gcc-c++ make autoconf aclocal automake libtool \
                    libjpeg-devel libpng-devel libtiff-devel zlib-devel \
                    libzip-devel freetype-devel lcms2-devel libwebp-devel \
                    tcl-devel tk-devel

# install leptonica
cd ~
mkdir leptonica
cd leptonica
wget http://www.leptonica.org/source/leptonica-1.74.1.tar.gz
tar -zxvf leptonica-*.tar.gz
cd leptonica-*
./configure
make
sudo make install

# install tesseract
cd ~
git clone --branch 4.00.00alpha https://github.com/tesseract-ocr/tesseract.git
cd tesseract
./autogen.sh
./configure --enable-debug
LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make
sudo make install

# create a python virtual env
virtualenv ~/tfenv
source ~/tfenv/bin/activate

# Install pillow
pip install pillow

# Install cython
pip install cython

# Install tesserocr
pip install tesserocr

# prepare the zip package
cd ~
mkdir lambda-tesseract
cd lambda-tesseract
cp /usr/local/bin/tesseract .
mkdir lib
cp /usr/local/lib/libtesseract.so.4 lib/
cp /usr/local/lib/liblept.so.5 lib/
cp /lib64/librt.so.1 lib/
cp /lib64/libz.so.1 lib/
cp /usr/lib64/libpng12.so.0 lib/
cp /usr/lib64/libjpeg.so.62 lib/
cp /usr/lib64/libtiff.so.5 lib/
cp /lib64/libpthread.so.0 lib/
cp /usr/lib64/libstdc++.so.6 lib/
cp /lib64/libm.so.6 lib/
cp /lib64/libgcc_s.so.1 lib/
cp /lib64/libc.so.6 lib/
cp /lib64/ld-linux-x86-64.so.2 lib/
cp /usr/lib64/libjbig.so.2.0 lib/
cp -r ~/tesseract/tessdata/ tessdata
cp -r ~/tfenv/lib/python2.7/site-packages/* .
cp -r ~/tfenv/lib64/python2.7/site-packages/* .

mkdir tessdata
wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata -O tessdata/eng.traineddata

# Create lambda_function.py file (see example below)
# lambda_function.py
import tesserocr
import PIL.Image
import io
from base64 import b64decode

def lambda_handler(event, context):
  binary = b64decode(event['image64'])
  image = PIL.Image.open(io.BytesIO(binary))
  text = tesserocr.image_to_text(image)
  return {'text' : text}
# zip the package
cd ~
zip -r lambda-tesseract.zip lambda-tesseract --exclude *.pyc
  1. You may then copy the zip package to your computer and upload it to S3
scp -i key.pem ec2-user@AWS_EC2_INSTANCE_IP:~/lambda-tesseract.zip .
  1. Use the zip url in S3 to configure AWS Lambda.

  2. Create an environment variable with key "TESSDATA_PREFIX" and leave the value empty.

  3. You can test the function with a test.json file like this:

{
"image64": "/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAkGBxITEhUTExMVFhUWGB0XGRYY\nGB4YGRYYGR8dIxgeFxgYHSggGxslIiAZITEhJisrLi4xGB8zODMsNygtLisB\nCgoKBQUFDgUFDisZExkrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysr\nKysrKysrKysrKysrKysrKysrK//AABEIAFMCWAMBIgACEQEDEQH/xAAcAAEA\nAgIDAQAAAAAAAAAAAAAABwgFBgECBAP/xABPEAABAwIEAwUFAwgFBw0BAAAB\nAgMRAAQFEiExBgdBEyJRYXEIFDKBkSNCUhUzVGJykpOhFySisdE1U3Sz0+Hw\nGCVjZXOCg5SywcPS4xb/xAAUAQEAAAAAAAAAAAAAAAAAAAAA/8QAFBEBAAAA\nAAAAAAAAAAAAAAAAAP/aAAwDAQACEQMRAD8AnGlKUClKUClfC8vG2klbi0oS\nN1KISPqdK+rawRI1B2Pj6eVB2pSlApSlApXRbgAJJgDUk6ACtfv+O8NZWW3L\nxlKxEjNMSJG0jag2OlY7CMctrlGe3eQ6kzqkzsYOm41rIA0HNKUoFKV0dcCQ\nVKIAG5JgD1JoO9K1nGePsNtkqLt21KdChCg45PTuIkj5wK8eD80cKuFZU3Ib\nVpo8C1M+BXAJG2/Sg3KlcJM1zQKUpQKUpQKV8rm4Q2hS3FJQhIlSlEJSkDck\nnQAeNYhvjHDlEJTfWhUTASH2ySTsAArUmgzlK4Sqa5oFKV1WuKDtStRRzNwg\nqUn31vuxJIUAZ/CcsK+Vd/6ScJ/Tmfqf8KDa6Vq6OYmFGSL5jQTquNPIHc+Q\n1rp/SThP6cz9T/hQbXSvHhOJs3LSXmFhxtU5VjYwSDHoQR8q9lApSlApSsBx\nNxlZWCQbl4IJnKgAqWqPBKdY8zp50GfpUS4vzzskz2LLzpKCQSA2M/3QZk5T\nqSoT00MmOmCc8Gn7hhgWbiS86hvMXQQkrUEzGXUCaCXaUpQKVpWMc0sMtnVs\nuOr7RtRSsBpZykCd4g+Gk712seaWEuJze9pR0KXEqQoH0I89xIoNzpUfX3OP\nCW1lHauOR95tsqSfQmJr4f024T4v/wAL/fQSRSo7Z5z4SoiXHkzOpaVGn7Mn\nyrceG8cavbdFyzm7NzNlzCD3VFJkdNUmgydKUoFKxHEnEtrYtdrcuBCdgN1r\nPghA1UfTbc6VFmI8/mwVBqyWqFEArdCQU66wEmDtprQTVSoFZ5+OZhmskZZ1\nh0zl6xKYnaPTz0lvhHi62xBrtbdSjBhSVCFIVAMEdd9xI86DP0pSgUpXxvLt\nDSC44pKEJEqWohKUjxJOgFB9qVDfE/PNltSkWbBeIMdqs5Wz5pSO8obfhrCW\nPP18K+2s2lJ0/NrUgjx+LMD6aetBP9K03gnmNZ4iAlBLb0E9iv4iBuW1bLHp\nqI1ArcQaDmlKUClKjnmNzTt7BPZsFD9zI7gMoQNCe0KToY2A+dBI1KrY5zzx\nMkwi2AnQdmowPCc+tbnwBzmTcOBm+ShpSiAh1OjZMbOZichJ2MxrBiNQmCld\nUKBEjau1ApSlApSlBFvG3OFuxuXbZNst1bUAqKwhMqAOkAkiDv49OtR1inOj\nFHSS2pthPQIbCiJGgKnM0nroBUs4xypsbq9cu3y6vtCCpvPlRISEj4QF9PxC\ns/gnBeH2sdhaMoVEZsudcftrlX8/DwoKmYpidxcr7R95x1XRTi1Kj0mYE9Bp\nVteBFTh1mcoT/V2u6JAHcToJ1qDfaH/ymj/RUf6xyp14JtS1h9o2dSlhsEyT\nJyiYmgzVKUoFa/xpxYxh1uX3pOuVCB8Ti40A8PM9K2CoP9pRlZFkoIVkSXgV\nxKQpXZwD4GEk+cHwNBofGnHV5iy0tkZWysdnbo1lRgCTErUTt57CsjgfJnFH\ntXA0wJiHFyrbcJbnrpqQa37kDwwlFsq7dYKXlLUltapB7GEQUpOglQX3okx4\nVLgSPCgqnxNy8xLDVh4oKkNkLD7JKggpMgnTMgiAZIjzrb+XfNu594QxfOdq\n24oIS4QlKm1E6E5QMySSkGdRU9uIBBBAMiIPUeB8qrRzm4MZw+4aUwCll9Kj\nlJkJcSrvBJjRMKRA1jWgsyDXNalyqxY3OF2zijKwjs1EnMZaJRJPiQkK1/FW\n20GN4jxFVvbPPpbLhabUsIBgqygmJ/49DVb+IONsUxgptkoMHXsWEqGbYSvv\nEkDzgCfSrPvNhQKSAQdCCJBB3BB3FYnhvhe0sUqTaspbCzmVBKiT5lRJjwGw\n1oK/YPyYxRxaA6lDCD8SlLSspHXuIMk76aDTenFnKC+s2S8lSLhKZzBuQpCY\n+LKdx0IEnUb1ZmKGgrhyr5lvWrzdtcuKct3FpTmWpRLEgJTlKjo3tKdgASI1\nmx6TVX+d2BsWuJZWEBtDjSXSkfDnUpQUR4DQGPOrA8AXrj2HWjrhlamUFRiJ\nMbx57+HXSg2ClKUCuCa5rTuaHGH5OsytBSX1nI0lXj95UDcJGvhMDrQaFz+4\ntVphrUQpKVvEg/iSWgkztoSdD0qGrBlSLlpKwQoOIkHQ6kEaHyqR+U3CS8Su\nVXt4S402vv8AaDP7w4Qe6So/CnukyDuBWsY2rtsdWMpGa+yQnvHR3L3ZgTpo\nNulBbOKV1Qa7UCvLijHaNOI/GhSfGMwI2677V6q8WNOZbd5UTlbWcp2MJOho\nKicH8POX90i2bUlKlBRzK+FISCST/d863t7kViOmVy1MiT31iD4fmzp5+dYH\nks4oYxbQYB7QK8x2a9PPWPpVqxQVwa5F4nIBdtgJ1PaLMDxjs9fSvtdciMQE\nZH7ZfjJWmP7Bn/dViaUGtcucCdscPZtXiguN55KCSnvuLUIJAOyh0rZaUoFc\nE1zUS8/eKHLe3btWyAbnOHNJPZpyiBrpJO8dDQYLmxzRuE3CrSxcLaWiQ48k\npJWqNUoOuUJ1BO8jpGul8JctsQxAhxKezZWCrtnSQFeYHxKk9YjzrcORnAiH\nx7/coCkpVDCFDulSfiWR1AOgB6gnpU8oQAAAIjYeHpQQ5g/IRgJ/rVy4tcjR\noBCPMHMFE+oy1mLDkpYNPofQ9cgtuJcQnMiAUkKAJyZiJG8gx9ak2lBwBXNK\nUFTeYrZXjN0gGM1xl9Jgf8elSDccgl65L4EaQFsnymSF+saV7OIOUFzcYg7d\ni4ZSlbwcAIUVASPKJ0qZBQQpY8gU5Ptr05j0bbGUbdVGT6wPSvo7yBZ0y3rm\n+stpMiDtroZj5TU0V5cVvEssuurMJbQpZMgaJBO506UFSeNcATYXSrVL4eKE\npKlBOQJUROUiTqBBn9boRVk+VGGuW+E2rTohYSpRHh2i1LAPgQFCR0M1XDhb\nC1X+JtNwtxLr2dwq1UW82ZZWR1ImT4mrctpgRAHkKDvXixjEm7Zlb7pyttpK\n1HfQeXU9APE17a1bmeB+SryQCOxVuY16GfEGCB1260Fb7t65xfEVFOda33IS\nDJDTalQkEDZCAdfTzqY8I5F2CEDt3HnlwJIUG0g9coSJj1JrRfZ4cP5SdSIh\nVusnQdFojXeNasenYUEXYryQw9TSgwXWnI7qyvOmemZJHw+Ma1CuEXr2FYkl\nxaDnYcIWjVOZOoVG0gjUE6HQ1bs1X/2jGmk3dspKYcU0orVA7wCoR6kd/fyo\nJ8tXwtCVp1SoAg+IIkfyr61qXKjN+SbPMtSj2WhMaCTlSIGwEAeQrbaBVdOe\n3F5ubgWTRUG7ZR7SdAt308EiQD5nyqxDjgSCSYAEn0G9VS4PKbzGmVOiQ7dF\nxQ6HvFYGvSQBB6UEh8A8mG1sh7EQvMoApYSrLkHTtCNcx10B0B8dtqv+TOEr\nQUoZW0rotDq1EfJxSkn6VIQrmgqZxvwHdYa6e0BWxplfSkhBnYH8KpB0nzFT\nnyi44GIW/ZuSLhgJSsk/nAZAWIAAkgyOn0raeLMGRd2b9uoA9oggT0Xug/JQ\nSflVbuUV+7b4swhMAuKLLiT+FWqhp1BSk+ooLUUpSg0fnDxD7nhzmUqDr32L\nZSNirVRJOwyhVQjyp4G/Kb6g4oi3ZAUvLupSicqASIE6knXQecjOe0Ti613r\nVtshlrOB4rcJkn5BIHz8alXlDw+mzw1kCCp9IuFqGklxIKQPJKco9QT1oMl/\n/C4Zlye4WsRH5lGb9+M0+c1CPNDleqwCrm3VmtJAIUe+1mMAE/eRMAHfUTO9\nWRrG8R4ULq1ft1bOtqRPgVDQ6g7GD8qCO+RvGouGRYuk9swjuaGFMJyga7Zk\nyBHgRHlK1VT5ZYkbLF2A4Snvm3cA1+KUwY0Iz5T8pq1SDpQdqUpQKUpQKUpQ\nVw9oj/KaP9FR/rHKsBw+f6rb/wDYt/8ApFQp7SVqlL9o6IzLbcQdNw2pJTJ6\n/GalHle2pOE2QUZPYpO891Wqf7JA8tqDaaUpQK4iuaUHEVzSlANQP7SGIJLt\noyFAlCVuEfeGcpCZ9cp+lStxrxYzh9up50gq1Dbcwp1X4U7/ADMafSa/cNYS\n9juKqceScpUHHyJAQ0CAlCSB8UQgdSEz0JoJn5H2iW8HtyE5VOFxaz1Ue0UE\nqP8A3EoE+AFb5XxtLZDaEttpCUJASlKRASkbAAbAV9qBSlKDia+N7dttNrcc\nWEIQkqUomAkDck1FvPLjC9sFWotHg32gcK+4hc5ckfnEmNzt41CfEHF9/fQL\nm4W6AZCICUTtORACZ84mgy3MjHVYjia1tQpGYMMxstKSQkgkD4iSdds0dKsx\nwjh6reytmF/G0y2hWs95KQFa+tQLyPfw1t9S7xaUPpILBdMNjfMUkmO02ifl\nrNWPRHSg7UpXBNB8MQu0MtrdcUEobSVqUdglIkn6VWbHb57HsWSlrMELORoK\nkhttO61JExOqj5wJrceePHx72HW6o/SFDwI/NajzBJHkPGti5NcAKsWzc3AI\nuHkZcsn7JskHKRMFSoSTO2UAdZDfMCwlu0t2mGkgJbSEiBEkbk+ZMk+pqq/E\nDQRjLqUBaQm8MZiSr85vJ1M7gnWCKtyaqNiCe0xpY1VnviBESQXiBE6ek6UF\nuQK5rgGuaBXyuUJKFBQBSQQQdiCNZ8q+teTFlqDDpQJUG1lIiZUEmBHX0oKp\ncrGyrFrMJmQ6FaGNEglX8gdOu1WKxXmRhdsoocu2ypJylKJcKTrIPZgxER5V\nWbhvhW8vXMlvbrX4mMqE/tLVoP7/AFqUML5CLISbm7CTrKGkZo00hxRE/u0G\n3v8AOvCUmAt1Q/Elox/ag/yrb+HuKLO9TmtrhtzrlBhY/aQYUPpURX/IBWvY\n3o6aONEdNe8lRnXyqN8Vwu9wm8CVZmnWzmbdTspIPxIPVJ6g+JBFBb6lapy4\n4wRiVoHdA8iEPIGwXG6RJ7itx8x0ra6Dgmqt8ycS/KGMrQFdwOJtmzGgCVZV\nGBqe+VHzqz167kQpcTlSVR6Amqm8u7xhGIJurtwZWQu4ObvKdcQCUJTm0LhV\nBEkajeYoLPcL4UixtGrfOCllEFZASCd1KPQak9frXgHMbCc2X35mZjcx+9ER\n57VA2PYliONXa1MtPKaUrs0NJns0pTKkhxU5M0d4kn00ArIK5H4pGabaYnL2\nipmNtURPzjzoLFWGIMvoDjLqHEHZSFBST80mK9NVCxPDMRwp4JWHrZaphbay\nkLSN8q2zBGvj97UCp65R8di/t+ycM3LISFkkS6CSAtIGvhm03I8aCQ6UpQKU\npQKjbnvjqrfDi02sBdwsNkdS3BLkeEwBP6xqSFGqs818fVfYo6EAlLR92bSB\nJVkJzRB7xUsqiNxloNy9nDCjmurpSdgllCuhJlTgHpDf1qdawHAmC+52Fvbk\nALQjvwSQXFSpZk6/ETWfoFebELVLra21CUrSpJ66KEHQ16a6rVFBVLA7+4wL\nE1ZwkqaJadTEhbaoJyEwQSAlQOnSdJqx2Cca4fcpBZumiYBKVKCVpkD4kqgj\ncD1qAubvGLGIPBNu0gobIAuMpDq95GuvZaiARMjSBWJseW+LPNpcRZuFChmS\nSpCZB65VqBE6dKCwvFvMSwsUAuOhxZ1S21C1kTqd8oG+pI2quuI3dzi+IjVS\n1POBKUgFQZbJ0CU9EoGp22JO816neV2MJSSbJcAToptR+QC5J8hWb5V8wkYa\nfd7hlIaUs53QmHkE6HON1pEARuI67UFhcBwxu2t27doHI0kITJkwOpPid/nX\nvr42j6VoStBlKgFJI2KSJBHqK+1BwRVSb95eGYs4toZV29wvKCSQUEmEkq1I\nUhUE+B+dW3qKecHLp2/U0/aISXhKHAVBGZESgydyDI9FDwoJHwXFmbllt9lY\nW2tOYEfzB8CDII6EV7pqpWC8TYlhLy2kKU0Qr7RlxIUgqTpqk9YAGZJBIA1i\ntjxDnfii0kITbtSIlLZUQfEdopQ+oigmDmXxu3htsSFf1hxKgyiCRIgFStNE\npkHXfb0r7y0cKsYtFEyS9JPiTJNZTgvha8xl4uvuLLCVS684TqJlSGp0Cjrt\non6CsbwYlKMbYFt9o2LvK2SfibzEAk6a5NfOgtnSuEmuaCpvMFarvGblKNVL\nuOwSDCe8jK2PlIAmrUYZZpZZaaSIS22lsCSYCAABJ1O25qq3Gea2xm4W33lN\n3ZeTI0KirOAQNwCY+VWutXCpKVKSUkgEpO6SRqPlQfWuq9q7V1XtQVHDQTje\nUAgJv4AO4Ae2PnVuRVS+ZTCWMXuU26SjK6FJAJJCyEqJT1HeJIHSQOlW0FBz\nSlKBSlKBSlKCu/tE3KziDLZJKE26VJT0ClrWFEeZCU/u1MXLNKxhVlnMnsEE\nfskdz+zlqHfaHZjEGlZYzsAFX4oUrcdI018/Kpl5dXaXcMtFoKI7FAIR8KVJ\nSAU7nUEQR4ig2OlKUClKUCtI5kcwWcNbKdF3Kky215HZS42RofMxHnXk5o8x\n28PR2LWRy6XsiZDQ/E4Bv5J0n03hThLhq5xm8UVuEyQt95WpSDP9oxCRsI6A\nCg7YVhOI47dlZKlSe+6qeyZBkhIHTyQPn1NWU4Y4fZsrdDDKYCQAVdVqAEqU\nepJk/M124d4ft7JlLNu2EIG/4ln8S1feV5n5QNK64/xJa2eU3L6GgucuYmVE\nbwACYGk+ooMvSsfgmMsXbfa27qHW5jMgzBG4I3SdRofEVkKBSlKDw4lg1tcQ\nH2GngDIDqEuAE7kBYMH/AArzN8L2KUKQm0t0oXopKWUJCh+sAnWsvSgrlzl5\ndt2Kk3VqlQt1HKpEyGV/dykmcqtd9iInvADe+QfESrixWwtSSq2WEpEQQ0oS\nifGDnE+QrYebmT8k3naRHZiJE9/Mns9uubL84qH/AGeE/wDOLhlWjCtAND3k\nRmPTy86CyNYPjTHhY2bt1kz9llhMxJWpKEyY0EqE+U1nKxvEWEIu7Z22WSlL\nqCgkAEidiJBEigqdgWPhq/TevtB+HVOqQTAWpU6zESFHMNPu1LA9oBv9AX/G\nH+zrLo5E4flAU/dkjchTYn5FowP8dzS65FYerLkeuURAV3kKzRudUaKOm2gj\nagw7nP8ATBiwVPSXhE+f2dRI3jsYgL5Tcn3g3HZ5oGbOVhOaJgGOnSpvPIew\nlJFzdR94S3r+yez7o9QahP8AIyDinuQUrJ737vm0Ksva5J2jNGu0UEmp5+u/\noCNf+mP/ANKmXhzGEXds1ct/C6gKAmcp+8knxSoFJ9K0C75G4av4HLluI0C0\nqEaT8SCddeu5+Vb/AMOYI1ZW7ds1mKGxAKjKjJJJJAAkkk6ADXQCgyVcEVzS\ng4SmK5pSgVpfNjhn36wcSmA419s2T1KAZTJ2zAkesVula/x7iYt8Pu3ZAIZW\nEk/jUkhAjrKiKCE/Z3uVDEHWkqIQtgqUPxKQpOU/LMr61Yyq3+zw2TiTigNE\n26wfKVNx9dashQfC8bzoUiYzJKZ8JEVTrA8BdfvEWYAS6tzszMQggnMT4hIB\nOm8VcsiqtcYtqwzHFuIA+zfTcISklIKVELyabCCUn1NBY3hfhu3sWEsW6SEC\nSZMlSjupR6nb6VmAK+GH3SXWm3U/CtCVjWdFAEajevRQYLjHhhnELZdu7pIl\nKwAVNrGykz8wY3BI61Wvgp52yxlhpChmTdC2WqNFJU4EL0PQiSOo0q17iwkE\nnYCT6CqnYG4H8cZdbBIcv0upEa5C8FSR5J1PhB8KC2dKUoFKUoMZxJi6bS1e\nuVCQ0gryzGYjZM+JMD51WHgTB/fsWaSrMEqdLy8p1SEysgHfcZZ31qWfaC4h\n7GzTap+O5V3vJpsgn5lWUegVXj9nzhpKWVX64K3CptvrkQkws7aKUfPYUExI\nrtSlArQ+dGOuWmGqU0vI44tLSVA5VAGSooI1BgHUbTW+VFvtDMFWHNKCSQh9\nJJBEJBStMmR1JA6akUGiciuDk3Nwq7dBLduoZExKVuwTqf1O6qB1KddKsYna\noj9nC5R7nctT30v5ynqErQgJPzKVfSpdoOCKgj2gOFENKRiDSSFOLCHojLmj\nuK01BMEE9dOu88VD3tF4okWrNsFDMt3OpIUM2RCVRmTvGYgz+r9AynIPGHHs\nN7NQ0t3C2lZM5ge8BqZGUKA8IjwqTaiz2erNSMNWo7OXClJ9AEJ+eoO1SnQK\nUpQYzGOH7W6CRcsNuhJkZ0hUT4E6isJh/LLCGVZkWTZJ6OFTo+SXVKA+lbdX\nnxC9bZbU66tKEJ1UpaglKR5k0Gnc2MdFhhjvZgJU5DDYA0SXAZ0G0ICyPMCo\no9n3CQ7iCnlJBTbtkgmdFr7qCBsdO039a8nNjjEYndttW4UppolDZEntlry9\n4IyzOmUDUmPOKmvlZwz7jYNtrbCH1St3YkqJOUKUN4TAiTGupoNwpSlBVbnR\narbxa4Kk5Q5kcTEwpJQkFQnxUFA+YNWV4YvC9aWzpyy4w2s5dpUkExqdJqHv\naOwpRVa3QR3cqmVr8DOZsHy1XHzrauQ2O9vh/YKUCu1UW95JbV3myfAfEj/w\nxQSXXVe1dq0vm5jy7TDHnGyA4spaSSJ+MwrTb4c0E9aCBMAR+U8bbLvfS/cF\nas+mZtMqynLt3UxpVrUVXX2ebILxBx0x9kwco81qCZBnoMw+dWMoFKUoFKUo\nPNc37Tcdo4hE7ZlhM+kmvgcdtf0lj+Kj/GoI5l8EYre4ncOt2qlo7oQoLQEl\nAAAgqKddCSNwSdxBOFRyZxcpCi02Jjul1MifGNNPWgzvtDYghy5tQhaVBLKj\nCSDGdWhkeIH8qk/ltiNs1hdmhVwyCGUkguJBBV3iCJ3Ex8qhf+hXF/wM/wAU\nV2VyUxWAQGCTuA7qn1JSAfkTQWL/AC7a/pLH8VH+Neq1ukODMhSVJ/EkhQPj\nqKrR/Qpi/wCBn+KKmzlRgLtlh6GHkhLoWtSgCFAyo5TI/Vy0G41F/N7mUbAe\n7W4/rKkhXaaFLKTMGCCFLMHQ6Aa1KFQpzX5b39/fm4tw2Wy2lOq8pBTMyCPO\ngjrhLDkYjeLfv7xtluQt5x1SUlxSj8Ce8nVQSvvD4QPMTP2BY7gto0lm3vLN\nDaegeRJPUqOaVKPiahBHJvFy5kLTaRE9oXU5PTSVT8q9p5GYn0ctTp0cX9NW\n/wDdQT1bcXYe4YbvLZZGpCXkGB+9UW+0Didq8wwlu5Qt5twns0FK5QpOqlKG\nqQIGkwZ1BgRrR5FYn/nbT99z/ZV8LPkliiwSTbohRELcPeAiFAoSrQ67wdNq\nCRPZz/yc/wD6Wv8A1bNSpWm8q+FHcNs1MvKQpxbqnTkJKRISkAFQBOiQdutb\nlQYPjXiAWFm7dFBX2cQkECVKISmSdhJEnwnetT4a5yYe+lIfUbd3KMwWCW83\nglYH94H8q3rGcKZumVMPoDja/iSSQD4aggggwQRqCKh/inkVOZdlcR1DT2vy\nS4kaddx4SdzQTLb37SxKHELExKVBQ+oNYrG+MbC0JFxctIUBOTNmcg7Q2mVa\n+lQAOSuLfgZ/iivpb8k8VKgFBlIO6i5IHySCaDnmrzLViCgxb50Wo3CtC8ZB\nBUOgBAgT5nwEockOGPdbHtiTnuiHCCnKUpTIQnXWdyfXymvhwdyatLVQduF+\n9OJIKQU5G0kfqSc5/a08qk4Cg5pSlArz4hetstqdcUEoQCpSjskDcmvRWG4w\nwT32yftc+QuogK10UCCmY3EgSOokUGCc5q4Rkze9p2nKErzekZfi8qr69irB\nxkXSCQx74l6SDOTtApRgknxMVIFtyCezjtL1vJ1ytkmPIEgT6/z2r3XfIFBV\n9nfKSn9dkLM6zqlxIjbpQbb/AEvYOFZTcneJ7JyPCZy7edbrh96h5tLragpt\naQpKhspJEgioWPs/q6YgPnbkf/N/xFTDw/hYtbZm3CswabS3miJygCYkxO/z\noMhUa84eNbnDBaqtw2e1U5nC0kyEZIAggj4j/KpKrCcScKWl8gIumQ4EzlMl\nKkFUTlUkgjZPrFBHWD8+bZUC4tnWtu8hQcTtqSDlUBPhNbza8wcLcSFC+twD\n+NwII9Urg1p+L8i7Jclh51nSADDiZHXWFeomteueQD0wi+bI/WaKDPoFq/vo\nJIxrmbhdu2Vm6bdOwQyoOqUYmISdB0kwKgjjfje6xZ9LaULS1mCWrdBkqUTu\nsAd9ZPyHTqTtbXIC4kTetATqQ2omPISJ+oqTuDOXtnhyUltAceG77gBX55Oi\nE+Q6bk70GP5S8D/k63Up2DcuwXI1CAPhQD1gySRoT5AVv1KUCoq55cELu2UX\nNugrfa0UlI7y2jrp4lJkgeClVKtcKFBWnlhzRVhw93eQp22kkFJ77U75QdFJ\nJ1y6RJM9DMqeaOEZZ98RtMQufSMu9fDiflZh14CezDDsz2jICSZ3zJjKqfSf\nA7zpH/J9/wCsB/5b/wDagxnM3m6m7ZNrZJWhtwQ6tYyrUPwJAJhJ6nczGms7\nDyK4JWwFX1wkpW4nKygiClswVLI3BVsNoE/ir1cN8jrVhxK7l43ISZDeTs2z\ntGcBSiqNdJg6SImZWQmBQdqUpQKUrq4mRG1BWPnhivvGKrQlJHYISx4lSgVK\nkAea4HpViuFsP93s7dnKEltpCSBtmCRm/nJnzrVMK5S2DT3vDinrh3tO1CnV\nzrMjMEgZtdZPhW/AUHNKUoFYziPBW7y2ctnfgcTBI3Sd0qTP3kmCPSsnSgqr\ncMX2AX6VGdDKSCQ2+jYyOu57p1SqD4TJ+G8+LJSR21u+2cuuXKtObwBkEjzI\nFSbjOEMXTSmX20uNqEFJ/wDYjVJ21BB0qP7rkbhilZkrum9B3UOJI0699tRn\n50GK4i57sJbIs7dxbhGingEoSfNKVFSvTTfeonssPxDGbtbiQp55ZzLWYCED\nQCSdEpGgCRrA0Bqb7HkhhaB3zcO7fG4ABHgG0p385rfcIwhi2bDVu0hpsfdQ\nIE+J6k+Zk0HHD+HJtrdphAAS2gJESRoNYnXUyfnWQpSg+F9dBptbivhQlSz6\nJEmone5+Wkdy0uFHTRRQkROuoJ6T0+m9S2+0FApOxBH1qLv6BcN/z95++1/s\naDDv+0AiSBYKIkxL4E+EgNmPST86j3iTjDEMXWGlSUlct27Q7oJgCQO8s+aj\nAkxE1NFjyVwpEZ0POwI+0dIB13PZBGvTTTStxwLhy1swRbMNtTAJQIKo2zHc\n/M9aCLuVHKp22fTdXwSFIEtNBQVCiFZi5pEpEQATrr0FTOKUoFKUoMZxHhDd\n3buW7k5HElJgwR1BBIOoIB2O2x2qsWNYNiGB3IUlakKP5t5HwLHXeQfNCh1G\nmoq2FeLFMLZuEhDzTbiQoKCVpChmGxhQImggy35+3ASkLs2ioDvFK1JBPkkg\nx9TWoY3jt/jl0huCok/ZMI+BsGJJ+WpWo/QaVPdzyswhasyrJAMAd1biE6fq\noWEn6VnsH4btLUk29uy0TuUICVEeGYCY0GlBgeV/BAwy2KVKSt5w5nVpGkjQ\nISTqUJ1ieqlHSYrdKUoFKUoFKUoOMormlKBSlKBSKUoFKUoOIpFKUCKRSlBz\nSlKBSlKBFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUo\nFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUoFKUo\nFKUoP//Z\n"
}

It should print 1234567890.

@rheft
Copy link

rheft commented Sep 17, 2018

Any ideas as how to include poppler in this installation? Need to convert incoming pdf's to images using the pdf2image library which relies on poppler.

@Suryaphaneeth
Copy link

@rheft, even I am having same issues. Please post an update if you could find a solution. Will update the post if I have any progress.

@mpetryszyn
Copy link

Struggling to get tesseract 4.0.0 to run on python 3.6 in Lambda. Anyone have any luck?

@luizgustavogp
Copy link

luizgustavogp commented Nov 20, 2018

Hi,

I configured the code entry with this s3 link URL https://s3.amazonaws.com/ca-lambda-tesseract/lambda-tesseract.zip, provided by @Ango but I received this error:

{ "stackTrace": [ [ "/var/task/lambda_function.py", 15, "lambda_handler", "text = pytesseract.image_to_string(image)" ], [ "/var/task/pytesseract/pytesseract.py", 125, "image_to_string", "raise TesseractError(status, errors)" ] ], "errorType": "TesseractError", "errorMessage": "(1, u'Error opening data file /usr/local/share/tessdata/eng.traineddata')" }

Any idea?

@johnykifle
Copy link

johnykifle commented Nov 28, 2018

Hello guys,

Has any one experienced this error message.

{
"errorMessage": "Unable to import module 'lambda_function'"
}

I have had this issue previously when i zip the 'lambda-tesseract' folder. And resolved it by zipping the content instead of the folder.
But now it doesn't work at all.

@gwittchen
Copy link

gwittchen commented Nov 28, 2018

Did you download the correct traindata since each version has it's own dataset.
Older version < 4 from tesseract require additional data in the tessdata folder such as "".cube.ln files
Check that TESSDATA_PREFIX is correctly set.

@johnykifle
Copy link

I am actually using leptonica-1.73.tar.gz and tesseract-3.04.01.

@gwittchen
Copy link

gwittchen commented Nov 29, 2018

I created a newer version using a docker container to build the binaries for the AWS lambda environment.
It's using leptonica-1.76 and tesseract-4.0.0 and the same simple lambda from this tutorial.

https://github.com/gwittchen/lambda-ocr

@pog8tor
Copy link

pog8tor commented Jan 4, 2019

I created a newer version using a docker container to build the binaries for the AWS lambda environment.
It's using leptonica-1.76 and tesseract-4.0.0 and the same simple lambda from this tutorial.

https://github.com/gwittchen/lambda-ocr

This is pure gold... i went back, just to say thanks
Python3.6 + tesseract 4 over AWS lambda

@seth10
Copy link

seth10 commented May 20, 2020

Ready to use https://s3.amazonaws.com/ca-lambda-tesseract/lambda-tesseract.zip

Thank you very much @Ango for preparing this! All I had to do to get it running was add os.environ["TESSDATA_PREFIX"] = os.path.join(LAMBDA_TASK_ROOT, 'tessdata'). Also I had to make sure I selected Python 2.7 as the runtime instead of Python 3.8.

@juang97
Copy link

juang97 commented Jun 4, 2020

@seth10 where exactly did you add this line? Was it in the .py file ? Thanks

@seth10
Copy link

seth10 commented Jun 4, 2020

@juang97 Exactly, in lambda_function.py I added it on line 10, right below the line that appends to os.environ["PATH"].

To be clear, this is what I did and worked for me in Python 2.7. The link gwittchen shared seems promising and updated. Being able to use Python 3.6 would be great, but I only noticed this after I started using Ango's zip. I haven't tried it myself.

@Gautami007
Copy link

Hello @seth10, @juang97, @Ango @johnykifle

I have followed all the steps mentioned above and created "lambda-tesseract.zip" file and uploaded to lambda through S3. But, I am getting below error white Test function;

{
"errorMessage": "Unable to import module 'lambda_function'"
}

I am using all below configuration;
tesseract 3.05.00
leptonica-1.74.4
libjpeg 6b (libjpeg-turbo 1.2.90) : libpng 1.5.13 : libtiff 4.0.3 : zlib 1.2.7 : libwebp 0.3.0
python 3.7

Also, in lambda "Environment variables" value field is "must" parameter, we can not keep it empty.

Could anyone of you help me out with my error?

@seth10
Copy link

seth10 commented Aug 6, 2020

Hi @Gautami007, to me this sounds like a general AWS Lambda issue, nothing related to tesseract. Please make sure that you haven't changed the name of the primary function you intend Lambda to call. If it's no longer called lambda_function, either change the function name or tell Lambda which function you want it to call using the Handler field, as seen here. Documentation here.

@Gautami007
Copy link

Hello @seth10,
Thank you for response.

I neither changed the lambda function name nor handler name. Also I did upload the zip that you have mentioned in comment "Ready to use" but that also not working.

Getting below error;

START RequestId: 37a526a1-af54-4e2c-bacf-82df1ab01ac8 Version: $LATEST
(1, u'Error opening data file /usr/local/share/tessdata/eng.traineddata'): TesseractError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 15, in lambda_handler
text = pytesseract.image_to_string(image)
File "/var/task/pytesseract/pytesseract.py", line 125, in image_to_string
raise TesseractError(status, errors)
TesseractError: (1, u'Error opening data file /usr/local/share/tessdata/eng.traineddata')

END RequestId: 37a526a1-af54-4e2c-bacf-82df1ab01ac8
REPORT RequestId: 37a526a1-af54-4e2c-bacf-82df1ab01ac8 Duration: 123.13 ms Billed Duration: 200 ms Memory Size: 128 MB Max Memory Used: 52 MB

@seth10
Copy link

seth10 commented Aug 6, 2020

Oh I see, so leaving the handler as lambda_function.lambda_handler should be fine then as that zip contines that file and that function. Given that you're seeing that error ("Error opening data file /usr/local/share/tessdata/eng.traineddata") now, I assume that means you're Lambda is at least starting up.

I feel like I remember seeing this error and that's why I added the os.environ["TESSDATA_PREFIX"] = os.path.join(LAMBDA_TASK_ROOT, 'tessdata') line, so it would know where to look for that file. Could you try modifying the lambda_function.py file and adding that on line 10?

@adriaanbd
Copy link

Hey I was very tempted to try this one out but opted for this EC2 + Docker that generates zip files using shell scripts to compile the libraries and dependencies, over here. It's very straightforward and reproducible.

@kollols
Copy link

kollols commented Aug 19, 2020

I am getting the below error while using seths solution :

START RequestId: 81a1f718-71a3-4597-8cf4-ea74a3304f02 Version: $LATEST
[Errno 13] Permission denied: OSError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 16, in lambda_handler
text = pytesseract.image_to_string(image)
File "/var/task/pytesseract/pytesseract.py", line 122, in image_to_string
config=config)
File "/var/task/pytesseract/pytesseract.py", line 46, in run_tesseract
proc = subprocess.Popen(command, stderr=subprocess.PIPE)
File "/usr/lib64/python2.7/subprocess.py", line 394, in init
errread, errwrite)
File "/usr/lib64/python2.7/subprocess.py", line 1047, in _execute_child
raise child_exception
OSError: [Errno 13] Permission denied

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment