Skip to content

Instantly share code, notes, and snippets.

@alexschultz
Last active April 11, 2019 17:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save alexschultz/607a3a5a7f1197cc594c537b025b6fe8 to your computer and use it in GitHub Desktop.
Save alexschultz/607a3a5a7f1197cc594c537b025b6fe8 to your computer and use it in GitHub Desktop.
ReadToMe

Read To Me

Project submission for AWS DeepLens Challenge


ReadToMe

Update

This project has been updated to use MxNet. I have also added some performance imporovements around the text cleanup process. Please see the repo for the latest changes.

Solution

For this project, I wanted to build an application that could read books to children. In order to achieve this, I designed a workflow which performs the following steps.

  • Determine when a page with text is in the camera frame
  • Clean up the image using OpenCV
  • Perform OCR (Optical Character Recognition)
  • Transform text into audio using AWS Polly
  • Play back the audio through speakers plugged into DeepLens

Model Training

I used Tensorflow to create an object detection model. At the time of this writing, the onboard Intel Model Optimization library does not work for TensorFlow. Once it is fixed I will be able to optimize this model to run on the GPU on the DeepLens device.

I followed this tutorial which uses this repo to learn how to build my model.

My dataset was made from a few hundred photos of my kids' books taken in various lighting conditions, orientations, and distances. Following the tutorial, I used labelImg to annotate my dataset with bounding boxes so I could train the model to identify Text Blocks on a page.

Here is the model that I trained.

Architecture

This project is built using GreenGrass, Python 3.6, TensorFlow, OpenCV, Tesseract, and AWS Polly.

Instructions for testing

Packaged Lambda

There is a test python script that you can use to test the application on your development machine before deploying to the DeepLens. You will need to install a few dependancies before being able to run the application. I would recommend you create a virtual environment and pip install the following dependancies.

  • opencv-python
  • pillow
  • pytesseract
  • tensorflow
  • boto3
  • pydub

To run this project on the deeplens, you will need to install Tesseract and TensorFlow.

In order to get sound to play on the DeepLens, you will need to grant GreenGrass permission to use the Audio Card.

Green Grass requires you to explicitly authorize all the hardware that your code has access to. One way you can configure this through the Group Resources section in the AWS IOT console. Once configured, you deploy these settings to the DeepLens which results in a JSON file getting deployed greengrass directory on the to the device.

To enable Audio playback through your Lambda, you need to add two resources. The sound card on the DeepLens is located at the path “/dev/snd/”. You need to add both “/dev/snd/pcmC0D0p” and “/dev/snd/controlC0” in order to play sound.

IOT Console

In order to get the Text Area cleaned up to perform OCR, it needs to go through a number of filters. This graphic shows the steps that ReadToMe goes through with each image before trying to turn the image into text.

IOT Console

@Mdegiraudd
Copy link

hi,
wich version of tensorflow did you use ?
which version of python did you install on your deeplens and with wich did you trained your model ?

Thank you.

Regards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment