Skip to content

Instantly share code, notes, and snippets.

@PanosAntoniadis
Last active August 26, 2019 15:30
Show Gist options
  • Save PanosAntoniadis/2a056cdbe4eb8556c30e33193e84d1b0 to your computer and use it in GitHub Desktop.
Save PanosAntoniadis/2a056cdbe4eb8556c30e33193e84d1b0 to your computer and use it in GitHub Desktop.
Final Report for GSoC 2019 for Creation of an online Greek mail dictation system, using Sphinx and personalized acoustic/language model training

Final Report for Google Summer of Code 2019

This is a final report of the work which was done as part of Creation of an online Greek mail dictation system, using Sphinx and personalized acoustic/language model training hosted in https://github.com/eellak/gsoc2019-sphinx and https://snf-870149.vm.okeanos.grnet.gr.

Abstract

The aim of the project is the implementation of a personalized Greek mail dictation system. The personalization is done both in the language model using the user's emails and in the acoustic model using previous recordings of the user. Also, the ASR output is passed through a post-processing system, where possible errors are corrected based on the adapted language model. By this way, we increase the accuracy of the default Greek model, which is low as a result of the limited amount of open source speech datasets.

A more detailed explanation of the project is located at the README and the Wiki Home Page.

Work and Repository

All of my work can be found at the project repository which was created from scratch and does not rely on any previous code. My commits can be found here.

Deliverables

  1. Tool for extracting and cleaning sent emails of a Gmail user. Code Wiki
  2. Tool for creating adapted language models through email clustering. Code Wiki
  3. Tool for correcting ASR output. Code Wiki
  4. Various tools for preparing and evaluating a speech dataset. Code Wiki
  5. Simple tool for creating a speech dataset. Code
  6. API written in Flask. Code Wiki
  7. Online webpage using Angular 8. Code Wiki

Project Progress

The whole progress of the project was tracked on a daily basis in Projects section.

Demo

The project is hosted at https://snf-870149.vm.okeanos.grnet.gr

Note: Till now, we use self signed ssl certificates for both the webpage and the api. As a result, before using the webpage, the user should give permission in both of them by entering https://snf-870149.vm.okeanos.grnet.gr and https://snf-870149.vm.okeanos.grnet.gr:5000 and clicking Advanced and Proceed to url.

Future Work

Some recommendations for future work can be found here.

People

  • Google Summer of Code 2019 Student: Panagiotis Antoniadis (PanosAntoniadis)
  • Mentor: Andreas Symeonidis (asymeon)
  • Mentor: Manos Tsardoulias (etsardou)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment