Skip to content

Instantly share code, notes, and snippets.

@VasilisPoulos
Last active August 31, 2020 16:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save VasilisPoulos/5176e80d0f8f4948e0549a58497d3b54 to your computer and use it in GitHub Desktop.
Save VasilisPoulos/5176e80d0f8f4948e0549a58497d3b54 to your computer and use it in GitHub Desktop.
GSoC 2020 - Tangibles

Google Summer of Code 2020 | Open Roberta - Tangibles Summary

GSoC Project abstract

Every year Google organizes Google Summer of Code a global program focused on introducing students to open source software development. This year i was selected by Open Roberta to work on a new feature that will introduce to users the ability of writing code, using real life tangible coding blocks. Now that GSoC 2020 is coming to an end it is time to summarize my GSoC project experience.

Project summary

The intent of the project was to design real life coding blocks and create a prototype code block recognition script that would enable their use in the Open Roberta Lab.

Code Block Design

The first month we talked about the design of the blocks, discussed design alternatives and searched for approaches from other people on the 'real life code blocks' subject (give examples). In order to make it easy for someone to make the tangible blocks it was decided that a tangible block will be cut out piece of paper or a sticker on a piece of plastic or wood.

We also set the use case of the final script:
Use case: A tangible program is placed on top of a white sheet of A4 paper. A user takes a picture and uploads it to the Open Roberta Lab using a button. The picture is then being processed by the script in order to generate a program made of blockly code blocks in the open roberta lab.

The design concluded to something similar to the paper blocks, open roberta already used in an event, adapted to our use case.

Code blocks:

  • follow the XML structure (nesting level) of blockly for easier AST generation.
  • are colored coded based on their type the same way as they are in the lab
  • have indentationts that indicate to the user how they can be connected together.
  • have a white border around them that helps with the processing later on.

Finally we determined an initial set of blocks that we would work on.

Implementation

The goal was to extract the text and the top left x,y coordinates of each block. Since we were in a prototype phase of the project we assumed that input images won't have user induced errors or hard to remove noise. For the implementation i used Google's Tesseract OCR and OpenCV with python.

The processing sequence has five parts;

1. Preparing the image for pre-processing

Before pre-processing we have to apply a perspective transform to obtain the top-down view of the A4 paper. This is useful because the resulting image of the transformation will have a nice saturation contrast between the white paper and the tangible blocks, that will help us later with the mask generation, but will also align the text horizontally. To make sure this contrast will exist in all images we white balance the output of the transformation using the LAB color space.

2. Pre-processing and block mask generation

To separate and label each block's mask from the image we convert it to the HSV color space and apply Otsu's thresholding method on the saturation chanel. In the binarized image we apply morphological transformations to get rid of any noise and close possible gaps left inside the blocks. The result of the previous methods is a binary image that contains only black blocks on the exact same location of the original image. Then it is possible to use the connected components method in order to label every one of these features and finally use them as a mask for each block.

3. Preparing text for OCR

We use the masks to crop the actual blocks from the image and then do more pre-processing to prepare the text for OCR. Currently the script crops the text out of the block and uses adaptive thresholding and morphological transformations so that it can be read by Tesseract.

4. Collecting data

At this point the text is ready to be read by OCR and using the contour of each block's mask we can also find it's x, y coordinates. This information is then saved in a 'code_block' object.

5. AST generation

Using the information that was collected we generate the AST of the program using the anytree library.

Challenges

We gradually moved away from what i initially proposed so i quickly had to move out of my comfort zone and learn new things throughout the project's timeline. At the beginning of the summer i was new to computer vision therefore experimentation with images was necessary so that i could understand how different problems are handled with traditional computer vision methods and built experience in order to come up with a robust pre-processing sequence. Throughout the work period there were various times when I felt stuck and i was unable to solve some problems but with the help of my mentors we managed to find solutions.

Learnings

Aside from deepening my computer vision skills further than my university's computer vision course, i worked with very experienced mentors that guided me through the program and helped me understand how much work needs to be done to create a prototype starting from scratch, an experience that is very different from typical software development work.

Work Product

I designed the initial set of tangible blocks, did some 'experiments' in python notebooks and created a prototype script
that can successfully process a small set of images. All of the above can be found in my GSoC github repository tangibles-recognition.

Work to be done

The script that i delivered is still far from a final product, there is still a lot of work to make it robust. Το continue from this point on i would follow the steps below;

  • Set more specific requirements for each processing step so that it is understood what needs to be tested.
  • Μake each part of the processing sequence more reliable while gradually enlarging the set of testing images.
  • Use more of the images information (i.e color, shape of the blocks).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment