Skip to content

Instantly share code, notes, and snippets.

View MisaOgura's full-sized avatar
:octocat:

Misa Ogura MisaOgura

:octocat:
View GitHub Profile

Miho’s tech test

First of all...

Well done on completing the bootcamp! I'm proud of you & glad you had a great time.

The project looks pretty good for a first tech test ;) You should be proud of yourself too!

Hopefully below feedback is useful... happy to jump on a call to discuss if you want too :)

This file has been truncated, but you can view the full file.
mkdir -p pkg
rm -rf pkg/kaldi_5.5
mkdir -p pkg/kaldi_5.5
cd pkg/kaldi_5.5 \
&& wget https://github.com/kaldi-asr/kaldi/archive/36f6dbf9a465c7ec94626a3f21debfb2c3483e80.tar.gz \
&& tar --strip-components=2 -xvzf *.tar.gz kaldi-36f6dbf9a465c7ec94626a3f21debfb2c3483e80/src \
&& tar --strip-components=1 -xvzf *.tar.gz kaldi-36f6dbf9a465c7ec94626a3f21debfb2c3483e80/COPYING \
&& touch kaldi.mk && make clean distclean && rm kaldi.mk \
&& rm base/version.h \
&& rm -rf *.tar.gz
_________________________________________________________________ test_fetch_audio_stream _________________________________________________________________
self = <urllib.request.HTTPSHandler object at 0x7efee7794dd8>, http_class = <class 'http.client.HTTPSConnection'>
req = <urllib.request.Request object at 0x7efee7794e48>, http_conn_args = {'check_hostname': None, 'context': None}, host = 'youtube.com'
h = <http.client.HTTPSConnection object at 0x7efee7794f60>
def do_open(self, http_class, req, **http_conn_args):
"""Return an HTTPResponse object for the request, using http_class.

Milestone 1: How does the amount, variety and quality of training data effect accuracy of the model?

We can retrain our existing models with more / different data to assess what are the possible improvements that can be made. The preparation of the data for training and findings we make should have a longer application than just the current Kaldi based system (i.e. if we ever move to a seq2seq system).

Evaluation Datasets

We have now got multiple evaluation datasets to benchmark our system, we also

SAD (Speech Activity Detection) - Viterbi decoding in Kaldi following VAD

Looking at detect_speech_activity.sh, these are the arguments/files/options that need to be provided to decode_sad.sh.

Arguments

  1. graph_dir
    • graph_dir/HCLG.fst: An FST calculated with prepare_sad_graph.py
    • options: --min-silence-duration=0.03 --min-speech-duration=0.3 --max-speech-duration=10.0 --frame-shift=0.01

Things we can try with GMM (suggested during a conversation with Jana)

  • Increase the training iteration to few hundreds

    • Possible stopping criteria: min variance (keep an eye out for collapsing variance)
  • Increase the number of Gaussian curves within each model

    • Currently init 32, diag 30 and full 20
    • Try init 258, diag 128 and full 64?
  • Increase the size of MFCC

# Choose a base image
FROM jupyter/scipy-notebook
# Information about the mainainter
LABEL maintainer="Misa Ogura <misa.ogura01@gmail.com>"
# Install TensorFlow & Keras
RUN conda install --quiet --yes \
'tensorflow=1.3*' \
'keras=2.0*' && \
// scope chain: { name: undefined }
var name = 'John'
// scope chain: { name: 'John' }
function greet (name) {
// [[scope]]: { name: 'John' }
// scope chain: { name: 'John', outerScope: { name: 'John' } }
return (function () {
// [[scope]]: { name: 'John', outerScope: { name: 'John' } }
var a = 'global'
function outer () {
var b = 'outer'
return function inner () {
var c = 'inner'
console.log('a:', a)
console.log('b:', b)
var a = 'global'
function outer () {
var b = 'outer'
return function inner () {
var c = 'inner'
console.log('a:', a)
console.log('b:', b)