Skip to content

Instantly share code, notes, and snippets.

@Leko
Last active December 3, 2015 09:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Leko/ae8c2b31454453a16204 to your computer and use it in GitHub Desktop.
Save Leko/ae8c2b31454453a16204 to your computer and use it in GitHub Desktop.
Chromeでの音声解析について調べてみた

Chromeの音声認識について調べてみた

精度

日本語で長い文章でもかなりの精度、認識率だった

  • TODO: 認識結果

Web Speech API

発話、音声認識ともにあるAPIだが、今回は発話は除外し音声認識についてだけ調査を行う

仕様の策定状況と対応ブラウザ

MDN
対応ブラウザはChromeのみ といっても過言ではない。
continuousプロパティはChromeしかサポートしておらず、それ以外の機能はFirefox44でもサポートされている模様。

参考リンク

デモ1: 全イベントに反応させる

function logCreator(type) {
	return function(e) {
		console.log(type, e);
	}
}

var r = new webkitSpeechRecognition();

var eventTypes = [
	'audioend',
	'audiostart',
	'end',
	'error',
	'nomatch',
	'result',
	'soundend',
	'soundstart',
	'speechend',
	'speechstart',
	'start',
];

for(var i = 0; i < eventTypes.length; i++) {
	r.addEventListener(eventTypes[i], logCreator(eventTypes[i]));
}

r.lang = 'ja-JP';
r.continuous = true;
r.interimResults = true;
r.start();

デモ2: 録音中・解析中の内容を画面に表示する

var noticeBox = document.createElement('pre');
document.body.appendChild(noticeBox);
noticeBox.placeholder = '入力結果';
noticeBox.contentEditable = true;
noticeBox.style.position = 'fixed';
noticeBox.style.top = '10px';
noticeBox.style.right = '10px';
noticeBox.style.width = '250px';
noticeBox.style.minHeight = '50px';
noticeBox.style.transition = '.2s';
noticeBox.style.zIndex = 99999;

var r = new webkitSpeechRecognition();
r.addEventListener('start', function() {
	noticeBox.style.background = 'rgba(255,100,100,.8)';
});
r.addEventListener('end', function() {
	noticeBox.style.background = 'rgba(200,200,200,.8)';
});
r.addEventListener('result', function(e) {
	console.log(e);
	if(e.results.length) {
		var results = e.results[0];
		if(results.isFinal) {
			noticeBox.textContent = 'fixed!!\n' + results[0].transcript;
		} else {
			var possiblity = [].slice.call(results).map(function(result, i) { return i + ':' + result.transcript; }).join('\n\n');
			noticeBox.textContent = 'progress('+results.length+')...\n' + possiblity;
		}
	} else {
		noticeBox.textContent = '結果が空です';
	}
});

r.lang = 'ja-JP';
r.continuous = true;
r.interimResults = true;
r.maxAlternatives = 3;
r.start();

言語

SpeechRecognition#langプロパティで指定可能
指定可能な値はen-USのようにロケール形式の文字列が指定可能なよう。

継続的解析

SpeechRecognition#continuousプロパティで指定可能
これをONにすると、解析途中の入力が取れるようになる

イベント

addEventListenerで購読可能。ほとんどのサンプルは直接プロパティを指定しているが複数イベントをバインドできるのでこちらのほうが良いのではないか?バグってたり未実装だったりして動かないのだろうか?

  • audioend: Fired when the user agent has finished capturing audio.
  • audiostart: Fired when the user agent has started to capture audio.
  • end: Fired when the speech recognition service has disconnected.
  • error: Fired when a speech recognition error occurs.
  • nomatch: Fired when the speech recognition service returns a final result with no significant recognition. This may involve some degree of recognition, which doesn't meet or exceed the confidence threshold.
  • result: Fired when the speech recognition service returns a result — a word or phrase has been positively recognized and this has been communicated back to the app.
  • soundend: Fired when any sound — recognisable speech or not — has stopped being detected.
  • soundstart: Fired when any sound — recognisable speech or not — has been detected.
  • speechend: Fired when speech recognised by the speech recognition service has stopped being detected.
  • speechstart: Fired when sound that is recognised by the speech recognition service as speech has been detected.
  • start: Fired when the speech recognition service has begun listening to incoming audio with intent to recognize grammars associated with the current SpeechRecognition.

カスタム辞書

SpeechRecognition#grammarsプロパティで指定可能・・・?

JSGF(JSpeech Grammar Format) という書式で指定可能らしい。まだRFCを読んでいない。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment