Created
December 27, 2018 14:34
-
-
Save rpratesh/0c69382cbe91e1acd310b42ef07bd743 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
lm.binary was generated from the LibriSpeech normalized LM training text, available {http://www.openslr.org/11}}, following this recipe (Jupyter notebook code): | |
import gzip | |
import io | |
import os | |
from urllib import request | |
# Grab corpus. | |
url = 'http://www.openslr.org/resources/11/librispeech-lm-norm.txt.gz' | |
data_upper = '/tmp/upper.txt.gz' | |
request.urlretrieve(url, data_upper) | |
# Convert to lowercase and cleanup. | |
data_lower = '/tmp/lower.txt' | |
with open(data_lower, 'w', encoding='utf-8') as lower: | |
with io.TextIOWrapper(io.BufferedReader(gzip.open(data_upper)), encoding='utf8') as upper: | |
for line in upper: | |
lower.write(line.lower()) | |
# Build pruned LM. | |
lm_path = '/tmp/lm.arpa' | |
!lmplz --order 5 \ | |
--temp_prefix /tmp/ \ | |
--memory 50% \ | |
--text {data_lower} \ | |
--arpa {lm_path} \ | |
--prune 0 0 0 1 | |
# Quantize and produce trie binary. | |
binary_path = '/tmp/lm.binary' | |
!build_binary -a 255 \ | |
-q 8 \ | |
trie \ | |
{lm_path} \ | |
{binary_path} | |
os.remove(lm_path) | |
The trie was then generated from the vocabulary of the language model: | |
./generate_trie ../data/alphabet.txt /tmp/lm.binary /tmp/trie | |
------------------------------------------------- | |
Generating TRIE and LM files in DeepSpeech: | |
/DeepSpeech/native_client/kenlm/build/bin# ./lmplz --text /docker_files/lm_own_v3/lower.txt --arpa /docker_files/lm_own_v3/words.arpa --o 5 --prune 0 0 0 1 | |
/DeepSpeech/native_client# ./generate_trie /docker_files/lm_own_v3/alphabet.txt /docker_files/lm_own_v3/lm_own_v3.binary /docker_files/lm_own_v3/trie | |
/DeepSpeech/native_client/kenlm/build/bin# ./build_binary -s -a 255 -q 8 trie /docker_files/lm_own_v3/words.arpa /docker_files/lm_own_v3/lm_own_v3.binary | |
./deepspeech --model /docker_files/checkpoints_cv_mozilla/output_graph.pb --alphabet /docker_files/lm_own_v3/alphabet.txt --lm /docker_files/lm_own_v3/lm_own_v3.binary --trie /docker_files/lm_own_v3/trie --audio /docker_files/wavs/wavs/preddy8_india.wav | |
------------------------------------------------------------ | |
Training commnand for tranfer learning | |
python DeepSpeech.py --train_files /docker_files/CV_own_2/custom_train.csv --test_files /docker_files/CV_own_2/custom_test.csv --dev_files /docker_files/CV_own_2/custom_dev.csv --drop_source_layers 2 --source_model_checkpoint_dir /docker_files/checkpoints_cv_mozilla/ --checkpoint_dir /docker_files/checkpoints_cv_mozilla/ --export_dir /docker_files/checkpoints_cv_mozilla/ --alphabet_config_path /docker_files/lm_own_v3/alphabet.txt --lm_binary_path /docker_files/lm_own_v3/lm_own_v3.binary --lm_trie_path /docker_files/lm_own_v3/trie --epoch 3 | |
python DeepSpeech.py --train_files /docker_files/CV_own_2/custom_train_CVSVA.csv --test_files /docker_files/CV_own_2/custom_test_CVSVA.csv --dev_files /docker_files/CV_own_2/custom_dev_CVSVA.csv --drop_source_layers 3 --source_model_checkpoint_dir /docker_files/checkpoints_cv_mozilla/ --checkpoint_dir /docker_files/checkpoints_cv_mozilla/ --export_dir /docker_files/checkpoints_cv_mozilla/ --alphabet_config_path /docker_files/lm_own_v3/alphabet.txt --lm_binary_path /docker_files/lm_own_v3/lm_own_v3.binary --lm_trie_path /docker_files/lm_own_v3/trie --epoch -10 --train_batch_size 8 | |
-------------------------------------------------------------- | |
Starting the docker: | |
#Inside mycroft-core folder | |
docker run -it --rm --device /dev/snd -e PULSE_SERVER=unix:${XDG_RUNTIME_DIR}/pulse/native -v ${XDG_RUNTIME_DIR}/pulse/native:${XDG_RUNTIME_DIR}/pulse/native -v ~/.config/pulse/cookie:/root/.config/pulse/cookie -v /home/admin1/0Pratesh/Speech/mycroft-core:/root/ -v /home/admin1/0Pratesh/Speech/deepspeech/frozen_models/:/models/ mycroft_core:ubuntu16_mic bash | |
#Inside the docker: | |
apt-get update | |
7 apt-get install sudo | |
8 useradd -m docker && echo "docker:docker" | chpasswd && adduser docker sudo | |
apt-get install -y git python3 python3-dev python-setuptools python-gobject-2-dev libtool libffi-dev libssl-dev autoconf automake bison swig libglib2.0-dev portaudio19-dev mpg123 screen flac curl libicu-dev pkg-config automake libjpeg-dev libfann-dev build-essential jq | |
20 apt-get install -y git | |
apt-get install vim | |
47 vim /etc/pip.conf | |
apt-get install pulseaudio alsa-utils -y | |
git ssl.Verify=false | |
git config --global http.sslverify false | |
#Activate the venv | |
cd /root/ | |
source .venv/bin/activate | |
git clone https://git.visteon.com/SmartVoiceAssistant/mycroft-skills.git | |
Bringing up mycroft: | |
vim mycroft/configuration/mycroft.conf | |
--> search for deepspeech you'll understand where to change for frozen,lm and trie file | |
#To start mycroft | |
./dev_setup.sh --allow-root (not needed everytime you start docker) | |
./start-mycroft.sh debug | |
#To pull the latest mycrodt branch.. set the gitlab credentials | |
git config --global user.email "preddy8@visteon.com" | |
git config --global user.name "pratesh" | |
git stash | |
git pull | |
git checkout development | |
git pusll | |
git pull | |
git checkout 9-download-skills-with-tags-or-branch-2 | |
--------------------------------------------------------------------------- | |
Disowning a docker container: | |
Run the docker with -dt flag | |
-d for daemon mode | |
docker run -dt $IMAGE_NAME | |
This will run it as daemon.Now exec this docker container | |
docker exec -it $CONT_ID bash | |
Inside docker run what ever process u want but by disowning it | |
nohup $$$$$PROCESS$$$$ & | |
#put & at the end to make it a background process | |
#Ad nohup in front to disown the process | |
Now exit the container. | |
DOcker will still running in the remote machine. | |
You can enter this container any time (direclty or through SSH) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment