Skip to content

Instantly share code, notes, and snippets.

@rpratesh
Created December 27, 2018 14:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rpratesh/0c69382cbe91e1acd310b42ef07bd743 to your computer and use it in GitHub Desktop.
Save rpratesh/0c69382cbe91e1acd310b42ef07bd743 to your computer and use it in GitHub Desktop.
lm.binary was generated from the LibriSpeech normalized LM training text, available {http://www.openslr.org/11}}, following this recipe (Jupyter notebook code):
import gzip
import io
import os
from urllib import request
# Grab corpus.
url = 'http://www.openslr.org/resources/11/librispeech-lm-norm.txt.gz'
data_upper = '/tmp/upper.txt.gz'
request.urlretrieve(url, data_upper)
# Convert to lowercase and cleanup.
data_lower = '/tmp/lower.txt'
with open(data_lower, 'w', encoding='utf-8') as lower:
with io.TextIOWrapper(io.BufferedReader(gzip.open(data_upper)), encoding='utf8') as upper:
for line in upper:
lower.write(line.lower())
# Build pruned LM.
lm_path = '/tmp/lm.arpa'
!lmplz --order 5 \
--temp_prefix /tmp/ \
--memory 50% \
--text {data_lower} \
--arpa {lm_path} \
--prune 0 0 0 1
# Quantize and produce trie binary.
binary_path = '/tmp/lm.binary'
!build_binary -a 255 \
-q 8 \
trie \
{lm_path} \
{binary_path}
os.remove(lm_path)
The trie was then generated from the vocabulary of the language model:
./generate_trie ../data/alphabet.txt /tmp/lm.binary /tmp/trie
-------------------------------------------------
Generating TRIE and LM files in DeepSpeech:
/DeepSpeech/native_client/kenlm/build/bin# ./lmplz --text /docker_files/lm_own_v3/lower.txt --arpa /docker_files/lm_own_v3/words.arpa --o 5 --prune 0 0 0 1
/DeepSpeech/native_client# ./generate_trie /docker_files/lm_own_v3/alphabet.txt /docker_files/lm_own_v3/lm_own_v3.binary /docker_files/lm_own_v3/trie
/DeepSpeech/native_client/kenlm/build/bin# ./build_binary -s -a 255 -q 8 trie /docker_files/lm_own_v3/words.arpa /docker_files/lm_own_v3/lm_own_v3.binary
./deepspeech --model /docker_files/checkpoints_cv_mozilla/output_graph.pb --alphabet /docker_files/lm_own_v3/alphabet.txt --lm /docker_files/lm_own_v3/lm_own_v3.binary --trie /docker_files/lm_own_v3/trie --audio /docker_files/wavs/wavs/preddy8_india.wav
------------------------------------------------------------
Training commnand for tranfer learning
python DeepSpeech.py --train_files /docker_files/CV_own_2/custom_train.csv --test_files /docker_files/CV_own_2/custom_test.csv --dev_files /docker_files/CV_own_2/custom_dev.csv --drop_source_layers 2 --source_model_checkpoint_dir /docker_files/checkpoints_cv_mozilla/ --checkpoint_dir /docker_files/checkpoints_cv_mozilla/ --export_dir /docker_files/checkpoints_cv_mozilla/ --alphabet_config_path /docker_files/lm_own_v3/alphabet.txt --lm_binary_path /docker_files/lm_own_v3/lm_own_v3.binary --lm_trie_path /docker_files/lm_own_v3/trie --epoch 3
python DeepSpeech.py --train_files /docker_files/CV_own_2/custom_train_CVSVA.csv --test_files /docker_files/CV_own_2/custom_test_CVSVA.csv --dev_files /docker_files/CV_own_2/custom_dev_CVSVA.csv --drop_source_layers 3 --source_model_checkpoint_dir /docker_files/checkpoints_cv_mozilla/ --checkpoint_dir /docker_files/checkpoints_cv_mozilla/ --export_dir /docker_files/checkpoints_cv_mozilla/ --alphabet_config_path /docker_files/lm_own_v3/alphabet.txt --lm_binary_path /docker_files/lm_own_v3/lm_own_v3.binary --lm_trie_path /docker_files/lm_own_v3/trie --epoch -10 --train_batch_size 8
--------------------------------------------------------------
Starting the docker:
#Inside mycroft-core folder
docker run -it --rm --device /dev/snd -e PULSE_SERVER=unix:${XDG_RUNTIME_DIR}/pulse/native -v ${XDG_RUNTIME_DIR}/pulse/native:${XDG_RUNTIME_DIR}/pulse/native -v ~/.config/pulse/cookie:/root/.config/pulse/cookie -v /home/admin1/0Pratesh/Speech/mycroft-core:/root/ -v /home/admin1/0Pratesh/Speech/deepspeech/frozen_models/:/models/ mycroft_core:ubuntu16_mic bash
#Inside the docker:
apt-get update
7 apt-get install sudo
8 useradd -m docker && echo "docker:docker" | chpasswd && adduser docker sudo
apt-get install -y git python3 python3-dev python-setuptools python-gobject-2-dev libtool libffi-dev libssl-dev autoconf automake bison swig libglib2.0-dev portaudio19-dev mpg123 screen flac curl libicu-dev pkg-config automake libjpeg-dev libfann-dev build-essential jq
20 apt-get install -y git
apt-get install vim
47 vim /etc/pip.conf
apt-get install pulseaudio alsa-utils -y
git ssl.Verify=false
git config --global http.sslverify false
#Activate the venv
cd /root/
source .venv/bin/activate
git clone https://git.visteon.com/SmartVoiceAssistant/mycroft-skills.git
Bringing up mycroft:
vim mycroft/configuration/mycroft.conf
--> search for deepspeech you'll understand where to change for frozen,lm and trie file
#To start mycroft
./dev_setup.sh --allow-root (not needed everytime you start docker)
./start-mycroft.sh debug
#To pull the latest mycrodt branch.. set the gitlab credentials
git config --global user.email "preddy8@visteon.com"
git config --global user.name "pratesh"
git stash
git pull
git checkout development
git pusll
git pull
git checkout 9-download-skills-with-tags-or-branch-2
---------------------------------------------------------------------------
Disowning a docker container:
Run the docker with -dt flag
-d for daemon mode
docker run -dt $IMAGE_NAME
This will run it as daemon.Now exec this docker container
docker exec -it $CONT_ID bash
Inside docker run what ever process u want but by disowning it
nohup $$$$$PROCESS$$$$ &
#put & at the end to make it a background process
#Ad nohup in front to disown the process
Now exit the container.
DOcker will still running in the remote machine.
You can enter this container any time (direclty or through SSH)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment