Leko/webkitSpeechRecognition.md

## webkitSpeechRecognition.md

      
    Raw
  

              webkitSpeechRecognition.md
            
          
    Chromeの音声認識について調べてみた

精度

日本語で長い文章でもかなりの精度、認識率だった

 TODO: 認識結果

Web Speech API

発話、音声認識ともにあるAPIだが、今回は発話は除外し音声認識についてだけ調査を行う
仕様の策定状況と対応ブラウザ

MDN

対応ブラウザはChromeのみ といっても過言ではない。

continuousプロパティはChromeしかサポートしておらず、それ以外の機能はFirefox44でもサポートされている模様。
参考リンク


https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API
MDN

デモ1: 全イベントに反応させる

function logCreator(type) {
	return function(e) {
		console.log(type, e);
	}
}

var r = new webkitSpeechRecognition();

var eventTypes = [
	'audioend',
	'audiostart',
	'end',
	'error',
	'nomatch',
	'result',
	'soundend',
	'soundstart',
	'speechend',
	'speechstart',
	'start',
];

for(var i = 0; i < eventTypes.length; i++) {
	r.addEventListener(eventTypes[i], logCreator(eventTypes[i]));
}

r.lang = 'ja-JP';
r.continuous = true;
r.interimResults = true;
r.start();
デモ2: 録音中・解析中の内容を画面に表示する

var noticeBox = document.createElement('pre');
document.body.appendChild(noticeBox);
noticeBox.placeholder = '入力結果';
noticeBox.contentEditable = true;
noticeBox.style.position = 'fixed';
noticeBox.style.top = '10px';
noticeBox.style.right = '10px';
noticeBox.style.width = '250px';
noticeBox.style.minHeight = '50px';
noticeBox.style.transition = '.2s';
noticeBox.style.zIndex = 99999;

var r = new webkitSpeechRecognition();
r.addEventListener('start', function() {
	noticeBox.style.background = 'rgba(255,100,100,.8)';
});
r.addEventListener('end', function() {
	noticeBox.style.background = 'rgba(200,200,200,.8)';
});
r.addEventListener('result', function(e) {
	console.log(e);
	if(e.results.length) {
		var results = e.results[0];
		if(results.isFinal) {
			noticeBox.textContent = 'fixed!!\n' + results[0].transcript;
		} else {
			var possiblity = [].slice.call(results).map(function(result, i) { return i + ':' + result.transcript; }).join('\n\n');
			noticeBox.textContent = 'progress('+results.length+')...\n' + possiblity;
		}
	} else {
		noticeBox.textContent = '結果が空です';
	}
});

r.lang = 'ja-JP';
r.continuous = true;
r.interimResults = true;
r.maxAlternatives = 3;
r.start();
言語

SpeechRecognition#langプロパティで指定可能

指定可能な値はen-USのようにロケール形式の文字列が指定可能なよう。
継続的解析

SpeechRecognition#continuousプロパティで指定可能

これをONにすると、解析途中の入力が取れるようになる
イベント

addEventListenerで購読可能。ほとんどのサンプルは直接プロパティを指定しているが複数イベントをバインドできるのでこちらのほうが良いのではないか？バグってたり未実装だったりして動かないのだろうか？

audioend: Fired when the user agent has finished capturing audio.
audiostart: Fired when the user agent has started to capture audio.
end: Fired when the speech recognition service has disconnected.
error: Fired when a speech recognition error occurs.
nomatch: Fired when the speech recognition service returns a final result with no significant recognition. This may involve some degree of recognition, which doesn't meet or exceed the confidence threshold.
result: Fired when the speech recognition service returns a result — a word or phrase has been positively recognized and this has been communicated back to the app.
soundend: Fired when any sound — recognisable speech or not — has stopped being detected.
soundstart: Fired when any sound — recognisable speech or not — has been detected.
speechend: Fired when speech recognised by the speech recognition service has stopped being detected.
speechstart: Fired when sound that is recognised by the speech recognition service as speech has been detected.
start: Fired when the speech recognition service has begun listening to incoming audio with intent to recognize grammars associated with the current SpeechRecognition.

カスタム辞書

SpeechRecognition#grammarsプロパティで指定可能・・・？
JSGF(JSpeech Grammar Format)
という書式で指定可能らしい。まだRFCを読んでいない。