alexschultz/ReadToMe.md

## ReadToMe.md

      
    Raw
  

              ReadToMe.md
            
          
    Read To Me

Project submission for AWS DeepLens Challenge


Update

This project has been updated to use MxNet. I have also added some performance imporovements around the text cleanup process. Please see the repo for the latest changes.
Solution

For this project, I wanted to build an application that could read books to children. In order to achieve this, I designed a workflow which performs the following steps.

Determine when a page with text is in the camera frame
Clean up the image using OpenCV
Perform OCR (Optical Character Recognition)
Transform text into audio using AWS Polly
Play back the audio through speakers plugged into DeepLens

Model Training

I used Tensorflow to create an object detection model. At the time of this writing, the onboard Intel Model Optimization library does not work for TensorFlow. Once it is fixed I will be able to optimize this model to run on the GPU on the DeepLens device.
I followed this tutorial which uses this repo to learn how to build my model.
My dataset was made from a few hundred photos of my kids' books taken in various lighting conditions, orientations, and distances.
Following the tutorial, I used labelImg to annotate my dataset with bounding boxes so I could train the model to identify Text Blocks on a page.
Here is the model that I trained.
Architecture

This project is built using GreenGrass, Python 3.6, TensorFlow, OpenCV, Tesseract, and AWS Polly.
Instructions for testing

Packaged Lambda
There is a test python script that you can use to test the application on your development machine before deploying to the DeepLens. You will need to install a few dependancies before being able to run the application. I would recommend you create a virtual environment and pip install the following dependancies.

opencv-python
pillow
pytesseract
tensorflow
boto3
pydub

To run this project on the deeplens, you will need to install Tesseract and TensorFlow.
In order to get sound to play on the DeepLens, you will need to grant GreenGrass permission to use the Audio Card.
Green Grass requires you to explicitly authorize all the hardware that your code has access to. One way you can configure this through the Group Resources section in the AWS IOT console. Once configured, you deploy these settings to the DeepLens which results in a JSON file getting deployed greengrass directory on the to the device.
To enable Audio playback through your Lambda, you need to add two resources. The sound card on the DeepLens is located at the path “/dev/snd/”. You need to add both “/dev/snd/pcmC0D0p” and “/dev/snd/controlC0” in order to play sound.

In order to get the Text Area cleaned up to perform OCR, it needs to go through a number of filters. This graphic shows the steps that ReadToMe goes through with each image before trying to turn the image into text.