日本語で長い文章でもかなりの精度、認識率だった
- TODO: 認識結果
発話、音声認識ともにあるAPIだが、今回は発話は除外し音声認識についてだけ調査を行う
MDN
対応ブラウザはChromeのみ といっても過言ではない。
continuousプロパティはChromeしかサポートしておらず、それ以外の機能はFirefox44でもサポートされている模様。
function logCreator(type) {
return function(e) {
console.log(type, e);
}
}
var r = new webkitSpeechRecognition();
var eventTypes = [
'audioend',
'audiostart',
'end',
'error',
'nomatch',
'result',
'soundend',
'soundstart',
'speechend',
'speechstart',
'start',
];
for(var i = 0; i < eventTypes.length; i++) {
r.addEventListener(eventTypes[i], logCreator(eventTypes[i]));
}
r.lang = 'ja-JP';
r.continuous = true;
r.interimResults = true;
r.start();
var noticeBox = document.createElement('pre');
document.body.appendChild(noticeBox);
noticeBox.placeholder = '入力結果';
noticeBox.contentEditable = true;
noticeBox.style.position = 'fixed';
noticeBox.style.top = '10px';
noticeBox.style.right = '10px';
noticeBox.style.width = '250px';
noticeBox.style.minHeight = '50px';
noticeBox.style.transition = '.2s';
noticeBox.style.zIndex = 99999;
var r = new webkitSpeechRecognition();
r.addEventListener('start', function() {
noticeBox.style.background = 'rgba(255,100,100,.8)';
});
r.addEventListener('end', function() {
noticeBox.style.background = 'rgba(200,200,200,.8)';
});
r.addEventListener('result', function(e) {
console.log(e);
if(e.results.length) {
var results = e.results[0];
if(results.isFinal) {
noticeBox.textContent = 'fixed!!\n' + results[0].transcript;
} else {
var possiblity = [].slice.call(results).map(function(result, i) { return i + ':' + result.transcript; }).join('\n\n');
noticeBox.textContent = 'progress('+results.length+')...\n' + possiblity;
}
} else {
noticeBox.textContent = '結果が空です';
}
});
r.lang = 'ja-JP';
r.continuous = true;
r.interimResults = true;
r.maxAlternatives = 3;
r.start();
SpeechRecognition#lang
プロパティで指定可能
指定可能な値はen-US
のようにロケール形式の文字列が指定可能なよう。
SpeechRecognition#continuous
プロパティで指定可能
これをONにすると、解析途中の入力が取れるようになる
addEventListener
で購読可能。ほとんどのサンプルは直接プロパティを指定しているが複数イベントをバインドできるのでこちらのほうが良いのではないか?バグってたり未実装だったりして動かないのだろうか?
- audioend: Fired when the user agent has finished capturing audio.
- audiostart: Fired when the user agent has started to capture audio.
- end: Fired when the speech recognition service has disconnected.
- error: Fired when a speech recognition error occurs.
- nomatch: Fired when the speech recognition service returns a final result with no significant recognition. This may involve some degree of recognition, which doesn't meet or exceed the confidence threshold.
- result: Fired when the speech recognition service returns a result — a word or phrase has been positively recognized and this has been communicated back to the app.
- soundend: Fired when any sound — recognisable speech or not — has stopped being detected.
- soundstart: Fired when any sound — recognisable speech or not — has been detected.
- speechend: Fired when speech recognised by the speech recognition service has stopped being detected.
- speechstart: Fired when sound that is recognised by the speech recognition service as speech has been detected.
- start: Fired when the speech recognition service has begun listening to incoming audio with intent to recognize grammars associated with the current SpeechRecognition.
SpeechRecognition#grammars
プロパティで指定可能・・・?
JSGF(JSpeech Grammar Format) という書式で指定可能らしい。まだRFCを読んでいない。