iamloivx/CMU Sphinx - Speech Recognition

## clean_audio.sh
# Create background noise profile from mp3
/usr/bin/sox noise.mp3 -n noiseprof noise.prof

# Remove noise from mp3 using profile
/usr/bin/sox input.mp3 output.mp3 noisered noise.prof 0.21

# Remove silence from mp3
/usr/bin/sox input.mp3 output.mp3 silence -l 1 0.3 5% -1 2.0 5%

# Remove noise and silence in a single command
/usr/bin/sox input.mp3 output.mp3 noisered noise.prof 0.21 silence -l 1 0.3 5% -1 2.0 5%

# Batch process files
/usr/bin/find . -type f -name "*.mp3" -mmin +30 -exec sox -S --multi-threaded -buffer 131072 {} /path/to/output/{} noisered noise.prof 0.21 silence -l 1 0.3 5% -1 2.0 5% \;

# Remove insignificant files
/usr/bin/find . -type f -name "*.mp3" -mmin +30 -size -500k -delete

## CMU Sphinx - Speech Recognition
CMU Sphinx
http://cmusphinx.sourceforge.net/wiki/tutorialam
http://www.speech.cs.cmu.edu/sphinxman/scriptman1.html


PocketSphinx
http://ghatage.com/2012/12/voice-to-text-in-linux-using-pocketsphinx/
http://ghatage.com/2012/12/make-pocketsphinx-recognize-new-words/


Languague model Adaptation:
http://pwnetics.wordpress.com/2011/07/01/sphinx-4-language-model-adaptation/

## Convert to sphinxAudioFormat
1. Convert wav sang định dạng chuẩn vào của sphinx:
Input File     : 'resampled.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:02.62 = 41878 samples ~ 196.303 CDDA sectors
Sample Encoding: 16-bit Signed Integer PCM


2. Lệnh chuyển đổi 1 file:
Run: sox [input.wav] -r 16k -e signed -b 16 -c 1 [output.wav]
Short: sox [input.wav] -r 16k [output.wav]


Before:

[vi@Manlab wav]$ file khong8k.wav
KHOONG0010.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 8000 Hz

[vi@Manlab wav]$ soxi khong8k.wav

Input File     : 'khong8k.wav'
Channels       : 1
Sample Rate    : 8000
Precision      : 16-bit
Duration       : 00:00:02.62 = 20939 samples ~ 196.303 CDDA sectors
Sample Encoding: 16-bit Signed Integer PCM

Full command in-process:

[vi@Manlab wav]$ sox khong8k.wav -r 16k -e signed -b 16 -c 1 khong16k.wav

For short with the input above:

[vi@Manlab wav]$ sox khong8k.wav -r 16k khong16k.wav


After:

[vi@Manlab wav]$ file khong16k.wav
KHONG16k.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz

[vi@Manlab wav]$ soxi khong16k.wav

Input File     : 'khong16k.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:02.62 = 41878 samples ~ 196.303 CDDA sectors
Sample Encoding: 16-bit Signed Integer PCM


2. Shell batch:

[vi@Manlab wav]$ for i in test/* ; do echo $i ; done;

## etc - rename config file.sh
for i in huanluyen_diadiem* ; do mv $i ${i:10}  ; done;

## pocketsphinx_continuous
Lỗi không mở được thiết bị thu âm khi sử dụng pocketsphinx_continuous:

INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: continuous.c(367): pocketsphinx_continuous COMPILED ON: Apr  3 2012, AT: 17:50:38

ad_oss.c(103): Failed to open audio device(/dev/dsp): No such file or directory
FATAL_ERROR: "continuous.c", line 246: Failed to open audio device


Solutions:
1. Install alsa development package and recompile sphinxbase
Run: yum install alsa-*

2. If still get the message error: ad_oss.c(103): Failed to open audio device(/dev/dsp): No such file or directory
Then run: "modprobe snd_pcm_oss" as root

3. If still get another message error: ad_oss.c(99): Audio device(/dev/dsp) busy
Then turn off all of applications are recording and using audio device

## sox - noise removal
noiseprof [profile-file]

Calculate a profile of the audio for use in noise reduction. See the description of the noisered effect for details.

noisered [profile-file [amount]]

Reduce noise in the audio signal by profiling and filtering. This effect is moderately effective at removing consistent background noise such as hiss or hum. To use it, first run SoX with the noiseprof effect on a section of audio that ideally would contain silence but in fact contains noise - such sections are typically found at the beginning or the end of a recording. noiseprof will write out a noise profile to profile-file, or to stdout if no profile-file or if ‘−’ is given. E.g.

   sox speech.wav −n trim 0 1.5 noiseprof speech.noise-profile

To actually remove the noise, run SoX again, this time with the noisered effect; noisered will reduce noise according to a noise profile (which was generated by noiseprof), from profile-file, or from stdin if no profile-file or if ‘−’ is given. E.g.

   sox speech.wav cleaned.wav noisered speech.noise-profile 0.3

How much noise should be removed is specified by amount-a number between 0 and 1 with a default of 0.5. Higher numbers will remove more noise but present a greater likelihood of removing wanted components of the audio signal. Before replacing an original recording with a noise-reduced version, experiment with different amount values to find the optimal one for your audio; use headphones to check that you are happy with the results, paying particular attention to quieter sections of the audio.

On most systems, the two stages - profiling and reduction - can be combined using a pipe, e.g.

   sox noisy.wav −n trim 0 1 noiseprof | play noisy.wav noisered

## sphinx4Normalize.sh
#!/usr/bin/env bash
usage="Help, usage: sphinx4Normalize -i /path/to/audio/input/ -o /path/to/audio/output/ [-t wav|mp3]";
# lay so luong tham so thong qua bien $#
if [ $# -eq 0 ]
  then
    echo $usage;
    exit 128;
fi

# Duyet danh sach tham so, su dung bien: $@
intput=""
output=""
fileout=""
type=wav

while [ "$1" != "" ]; do
    case $1 in
        -i |-di| --input )	shift
                                input=$1
                                ;;
        -o|-do| --output )	shift
        			  output=$1
                                ;;
        -t| --type )	shift
       			  case $1 in
	       			  wav|mp3)
		        			  type=$1
		                      ;;
	                     esac
                    	 ;;
        -h | --help )  echo $usage
                                exit 0
                                ;;
        * )                     echo $usage
                                exit 1
    esac
    shift
done

if [ $input = '' ] || [ $output = '' ] ; then
	echo $usage
	exit 128;
fi

for i in $input/*$type ; do
	fileout="$output/`basename $i`";
	#echo $fileout;
	echo "Processing $i";
	sox $i -r 16k -e signed -b 16 -c 1 $fileout
	echo "Output: $fileout";
done

echo "Complete!"
	# Create background noise profile from mp3
	/usr/bin/sox noise.mp3 -n noiseprof noise.prof

	# Remove noise from mp3 using profile
	/usr/bin/sox input.mp3 output.mp3 noisered noise.prof 0.21

	# Remove silence from mp3
	/usr/bin/sox input.mp3 output.mp3 silence -l 1 0.3 5% -1 2.0 5%

	# Remove noise and silence in a single command
	/usr/bin/sox input.mp3 output.mp3 noisered noise.prof 0.21 silence -l 1 0.3 5% -1 2.0 5%

	# Batch process files
	/usr/bin/find . -type f -name "*.mp3" -mmin +30 -exec sox -S --multi-threaded -buffer 131072 {} /path/to/output/{} noisered noise.prof 0.21 silence -l 1 0.3 5% -1 2.0 5% \;

	# Remove insignificant files
	/usr/bin/find . -type f -name "*.mp3" -mmin +30 -size -500k -delete
	CMU Sphinx
	http://cmusphinx.sourceforge.net/wiki/tutorialam
	http://www.speech.cs.cmu.edu/sphinxman/scriptman1.html


	PocketSphinx
	http://ghatage.com/2012/12/voice-to-text-in-linux-using-pocketsphinx/
	http://ghatage.com/2012/12/make-pocketsphinx-recognize-new-words/



	Languague model Adaptation:
	http://pwnetics.wordpress.com/2011/07/01/sphinx-4-language-model-adaptation/
	1. Convert wav sang định dạng chuẩn vào của sphinx:
	Input File : 'resampled.wav'
	Channels : 1
	Sample Rate : 16000
	Precision : 16-bit
	Duration : 00:00:02.62 = 41878 samples ~ 196.303 CDDA sectors
	Sample Encoding: 16-bit Signed Integer PCM


	2. Lệnh chuyển đổi 1 file:
	Run: sox [input.wav] -r 16k -e signed -b 16 -c 1 [output.wav]
	Short: sox [input.wav] -r 16k [output.wav]


	Before:

	[vi@Manlab wav]$ file khong8k.wav
	KHOONG0010.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 8000 Hz

	[vi@Manlab wav]$ soxi khong8k.wav

	Input File : 'khong8k.wav'
	Channels : 1
	Sample Rate : 8000
	Precision : 16-bit
	Duration : 00:00:02.62 = 20939 samples ~ 196.303 CDDA sectors
	Sample Encoding: 16-bit Signed Integer PCM

	Full command in-process:

	[vi@Manlab wav]$ sox khong8k.wav -r 16k -e signed -b 16 -c 1 khong16k.wav

	For short with the input above:

	[vi@Manlab wav]$ sox khong8k.wav -r 16k khong16k.wav


	After:

	[vi@Manlab wav]$ file khong16k.wav
	KHONG16k.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz

	[vi@Manlab wav]$ soxi khong16k.wav

	Input File : 'khong16k.wav'
	Channels : 1
	Sample Rate : 16000
	Precision : 16-bit
	Duration : 00:00:02.62 = 41878 samples ~ 196.303 CDDA sectors
	Sample Encoding: 16-bit Signed Integer PCM


	2. Shell batch:

	[vi@Manlab wav]$ for i in test/* ; do echo $i ; done;
	Lỗi không mở được thiết bị thu âm khi sử dụng pocketsphinx_continuous:

	INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
	INFO: continuous.c(367): pocketsphinx_continuous COMPILED ON: Apr 3 2012, AT: 17:50:38

	ad_oss.c(103): Failed to open audio device(/dev/dsp): No such file or directory
	FATAL_ERROR: "continuous.c", line 246: Failed to open audio device


	Solutions:
	1. Install alsa development package and recompile sphinxbase
	Run: yum install alsa-*

	2. If still get the message error: ad_oss.c(103): Failed to open audio device(/dev/dsp): No such file or directory
	Then run: "modprobe snd_pcm_oss" as root

	3. If still get another message error: ad_oss.c(99): Audio device(/dev/dsp) busy
	Then turn off all of applications are recording and using audio device
	noiseprof [profile-file]

	Calculate a profile of the audio for use in noise reduction. See the description of the noisered effect for details.

	noisered [profile-file [amount]]

	Reduce noise in the audio signal by profiling and filtering. This effect is moderately effective at removing consistent background noise such as hiss or hum. To use it, first run SoX with the noiseprof effect on a section of audio that ideally would contain silence but in fact contains noise - such sections are typically found at the beginning or the end of a recording. noiseprof will write out a noise profile to profile-file, or to stdout if no profile-file or if ‘−’ is given. E.g.

	sox speech.wav −n trim 0 1.5 noiseprof speech.noise-profile

	To actually remove the noise, run SoX again, this time with the noisered effect; noisered will reduce noise according to a noise profile (which was generated by noiseprof), from profile-file, or from stdin if no profile-file or if ‘−’ is given. E.g.

	sox speech.wav cleaned.wav noisered speech.noise-profile 0.3

	How much noise should be removed is specified by amount-a number between 0 and 1 with a default of 0.5. Higher numbers will remove more noise but present a greater likelihood of removing wanted components of the audio signal. Before replacing an original recording with a noise-reduced version, experiment with different amount values to find the optimal one for your audio; use headphones to check that you are happy with the results, paying particular attention to quieter sections of the audio.

	On most systems, the two stages - profiling and reduction - can be combined using a pipe, e.g.

	sox noisy.wav −n trim 0 1 noiseprof \| play noisy.wav noisered
	#!/usr/bin/env bash
	usage="Help, usage: sphinx4Normalize -i /path/to/audio/input/ -o /path/to/audio/output/ [-t wav\|mp3]";
	# lay so luong tham so thong qua bien $#
	if [ $# -eq 0 ]
	then
	echo $usage;
	exit 128;
	fi

	# Duyet danh sach tham so, su dung bien: $@
	intput=""
	output=""
	fileout=""
	type=wav

	while [ "$1" != "" ]; do
	case $1 in
	-i \|-di\| --input ) shift
	input=$1
	;;
	-o\|-do\| --output ) shift
	output=$1
	;;
	-t\| --type ) shift
	case $1 in
	wav\|mp3)
	type=$1
	;;
	esac
	;;
	-h \| --help ) echo $usage
	exit 0
	;;
	* ) echo $usage
	exit 1
	esac
	shift
	done

	if [ $input = '' ] \|\| [ $output = '' ] ; then
	echo $usage
	exit 128;
	fi

	for i in $input/*$type ; do
	fileout="$output/`basename $i`";
	#echo $fileout;
	echo "Processing $i";
	sox $i -r 16k -e signed -b 16 -c 1 $fileout
	echo "Output: $fileout";
	done

	echo "Complete!"