Skip to content

Instantly share code, notes, and snippets.

@vunb
Last active February 14, 2021 18:04
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save vunb/7132619 to your computer and use it in GitHub Desktop.
Save vunb/7132619 to your computer and use it in GitHub Desktop.
Tập hợp các link tham khảo CMU Sphinx
# Create background noise profile from mp3
/usr/bin/sox noise.mp3 -n noiseprof noise.prof
# Remove noise from mp3 using profile
/usr/bin/sox input.mp3 output.mp3 noisered noise.prof 0.21
# Remove silence from mp3
/usr/bin/sox input.mp3 output.mp3 silence -l 1 0.3 5% -1 2.0 5%
# Remove noise and silence in a single command
/usr/bin/sox input.mp3 output.mp3 noisered noise.prof 0.21 silence -l 1 0.3 5% -1 2.0 5%
# Batch process files
/usr/bin/find . -type f -name "*.mp3" -mmin +30 -exec sox -S --multi-threaded -buffer 131072 {} /path/to/output/{} noisered noise.prof 0.21 silence -l 1 0.3 5% -1 2.0 5% \;
# Remove insignificant files
/usr/bin/find . -type f -name "*.mp3" -mmin +30 -size -500k -delete
CMU Sphinx
http://cmusphinx.sourceforge.net/wiki/tutorialam
http://www.speech.cs.cmu.edu/sphinxman/scriptman1.html
PocketSphinx
http://ghatage.com/2012/12/voice-to-text-in-linux-using-pocketsphinx/
http://ghatage.com/2012/12/make-pocketsphinx-recognize-new-words/
Languague model Adaptation:
http://pwnetics.wordpress.com/2011/07/01/sphinx-4-language-model-adaptation/
1. Convert wav sang định dạng chuẩn vào của sphinx:
Input File : 'resampled.wav'
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:02.62 = 41878 samples ~ 196.303 CDDA sectors
Sample Encoding: 16-bit Signed Integer PCM
2. Lệnh chuyển đổi 1 file:
Run: sox [input.wav] -r 16k -e signed -b 16 -c 1 [output.wav]
Short: sox [input.wav] -r 16k [output.wav]
Before:
[vi@Manlab wav]$ file khong8k.wav
KHOONG0010.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 8000 Hz
[vi@Manlab wav]$ soxi khong8k.wav
Input File : 'khong8k.wav'
Channels : 1
Sample Rate : 8000
Precision : 16-bit
Duration : 00:00:02.62 = 20939 samples ~ 196.303 CDDA sectors
Sample Encoding: 16-bit Signed Integer PCM
Full command in-process:
[vi@Manlab wav]$ sox khong8k.wav -r 16k -e signed -b 16 -c 1 khong16k.wav
For short with the input above:
[vi@Manlab wav]$ sox khong8k.wav -r 16k khong16k.wav
After:
[vi@Manlab wav]$ file khong16k.wav
KHONG16k.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz
[vi@Manlab wav]$ soxi khong16k.wav
Input File : 'khong16k.wav'
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:02.62 = 41878 samples ~ 196.303 CDDA sectors
Sample Encoding: 16-bit Signed Integer PCM
2. Shell batch:
[vi@Manlab wav]$ for i in test/* ; do echo $i ; done;
for i in huanluyen_diadiem* ; do mv $i ${i:10} ; done;
Lỗi không mở được thiết bị thu âm khi sử dụng pocketsphinx_continuous:
INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: continuous.c(367): pocketsphinx_continuous COMPILED ON: Apr 3 2012, AT: 17:50:38
ad_oss.c(103): Failed to open audio device(/dev/dsp): No such file or directory
FATAL_ERROR: "continuous.c", line 246: Failed to open audio device
Solutions:
1. Install alsa development package and recompile sphinxbase
Run: yum install alsa-*
2. If still get the message error: ad_oss.c(103): Failed to open audio device(/dev/dsp): No such file or directory
Then run: "modprobe snd_pcm_oss" as root
3. If still get another message error: ad_oss.c(99): Audio device(/dev/dsp) busy
Then turn off all of applications are recording and using audio device
noiseprof [profile-file]
Calculate a profile of the audio for use in noise reduction. See the description of the noisered effect for details.
noisered [profile-file [amount]]
Reduce noise in the audio signal by profiling and filtering. This effect is moderately effective at removing consistent background noise such as hiss or hum. To use it, first run SoX with the noiseprof effect on a section of audio that ideally would contain silence but in fact contains noise - such sections are typically found at the beginning or the end of a recording. noiseprof will write out a noise profile to profile-file, or to stdout if no profile-file or if ‘−’ is given. E.g.
sox speech.wav −n trim 0 1.5 noiseprof speech.noise-profile
To actually remove the noise, run SoX again, this time with the noisered effect; noisered will reduce noise according to a noise profile (which was generated by noiseprof), from profile-file, or from stdin if no profile-file or if ‘−’ is given. E.g.
sox speech.wav cleaned.wav noisered speech.noise-profile 0.3
How much noise should be removed is specified by amount-a number between 0 and 1 with a default of 0.5. Higher numbers will remove more noise but present a greater likelihood of removing wanted components of the audio signal. Before replacing an original recording with a noise-reduced version, experiment with different amount values to find the optimal one for your audio; use headphones to check that you are happy with the results, paying particular attention to quieter sections of the audio.
On most systems, the two stages - profiling and reduction - can be combined using a pipe, e.g.
sox noisy.wav −n trim 0 1 noiseprof | play noisy.wav noisered
#!/usr/bin/env bash
usage="Help, usage: sphinx4Normalize -i /path/to/audio/input/ -o /path/to/audio/output/ [-t wav|mp3]";
# lay so luong tham so thong qua bien $#
if [ $# -eq 0 ]
then
echo $usage;
exit 128;
fi
# Duyet danh sach tham so, su dung bien: $@
intput=""
output=""
fileout=""
type=wav
while [ "$1" != "" ]; do
case $1 in
-i |-di| --input ) shift
input=$1
;;
-o|-do| --output ) shift
output=$1
;;
-t| --type ) shift
case $1 in
wav|mp3)
type=$1
;;
esac
;;
-h | --help ) echo $usage
exit 0
;;
* ) echo $usage
exit 1
esac
shift
done
if [ $input = '' ] || [ $output = '' ] ; then
echo $usage
exit 128;
fi
for i in $input/*$type ; do
fileout="$output/`basename $i`";
#echo $fileout;
echo "Processing $i";
sox $i -r 16k -e signed -b 16 -c 1 $fileout
echo "Output: $fileout";
done
echo "Complete!"
@vunb
Copy link
Author

vunb commented Oct 24, 2013

Bài viết về Nhận dạng tiếng nói, khá ngắn gọn
http://web.science.mq.edu.au/~cassidy/comp449/html/comp449.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment