VasilisPoulos/GSoC2020.md

## GSoC2020.md

      
    Raw
  

              GSoC2020.md
            
          
    Google Summer of Code 2020 | Open Roberta - Tangibles Summary


Every year Google organizes Google Summer of Code a global program focused on introducing students to open
source software development. This year i was selected by Open Roberta to work on a new feature that will
introduce to users the ability of writing code, using real life tangible coding blocks. Now that GSoC 2020 is
coming to an end it is time to summarize my GSoC project experience.
Project summary

The intent of the project was to design real life coding blocks and create a prototype code block recognition
script that would enable their use in the Open Roberta Lab.
Code Block Design

The first month we talked about the design of the blocks, discussed design alternatives and searched for
approaches from other people on the 'real life code blocks' subject (give examples). In order to make it easy
for someone to make the tangible blocks it was decided that a tangible block will be cut out piece of paper or a
sticker on a piece of plastic or wood.
We also set the use case of the final script:

Use case: A tangible program is placed on top of a white sheet of A4 paper. A user takes a picture and
uploads it to the Open Roberta Lab using a button. The picture is then being processed by the script
in order to generate a program made of blockly code blocks in the open roberta lab.
The design concluded to something similar to the paper blocks,
open roberta already used in an event, adapted to our use case.
Code blocks:

follow the XML structure (nesting level) of blockly for easier AST generation.
are colored coded based on their type the same way as they are in the lab
have indentationts that indicate to the user how they can be connected together.
have a white border around them that helps with the processing later on.

Finally we determined an initial set of blocks that we would work on.
Implementation

The goal was to extract the text and the top left x,y coordinates of each block. Since we were in a prototype
phase of the project we assumed that input images won't have user induced errors or hard to remove noise.
For the implementation i used Google's Tesseract OCR and OpenCV with python.
The processing sequence has five parts;
1. Preparing the image for pre-processing

Before pre-processing we have to apply a perspective transform to obtain the top-down view of the A4 paper. This
is useful because the resulting image of the transformation will have a nice saturation contrast between the white
paper and the tangible blocks, that will help us later with the mask generation, but will also align the text horizontally. To make
sure this contrast will
exist in all images we white balance the output of the transformation using the LAB color space.
2. Pre-processing and block mask generation

To separate and label each block's mask from the image we convert it to the HSV color space and apply Otsu's
thresholding method on the saturation chanel. In the binarized image we apply morphological
transformations to get rid of any noise and close possible gaps left inside the blocks. The result of the
previous methods is a binary image that contains only black blocks on the exact same location of the original
image. Then it is possible to use the connected components method in order to label every one of these features
and finally use them as a mask for each block.
3. Preparing text for OCR

We use the masks to crop the actual blocks from the image and then do more pre-processing to prepare the text for
OCR. Currently the script crops the text out of the block and uses adaptive thresholding and morphological
transformations so that it can be read by Tesseract.
4. Collecting data

At this point the text is ready to be read by OCR and using the contour of each block's mask we can also find
it's x, y coordinates. This information is then saved in a 'code_block' object.
5. AST generation

Using the information that was collected we generate the AST of the program using the anytree library.
Challenges

We gradually moved away from what i initially proposed so i quickly had to move out of my comfort zone and
learn new things throughout the project's timeline. At the beginning of the summer i was new to computer vision
therefore experimentation with images was necessary so that i could understand how different problems are handled
with traditional computer vision methods and built experience in order to come up with a robust pre-processing
sequence. Throughout the work period there were various times when I felt stuck and i was unable to solve
some problems but with the help of my mentors we managed to find solutions.
Learnings

Aside from deepening my computer vision skills further than my university's computer vision course, i worked with
very experienced mentors that guided me through the program and helped me understand how much work needs to be
done to create a prototype starting from scratch, an experience that is very different from typical software
development work.
Work Product

I designed the initial set of tangible blocks, did some 'experiments' in python notebooks and created a prototype script

that can successfully process a small set of images.
All of the above can be found in my GSoC github repository tangibles-recognition.
Work to be done

The script that i delivered is still far from a final product, there is still a lot of work to make it robust.
Το continue from this point on i would follow the steps below;

Set more specific requirements for each processing step so that it is understood what needs to be tested.
Μake each part of the processing sequence more reliable while gradually enlarging the set of testing images.
Use more of the images information (i.e color, shape of the blocks).