Last active
March 2, 2023 21:23
-
-
Save noseratio/1db24da2deed0a63a3bd453a0bbce49a to your computer and use it in GitHub Desktop.
A simple streamingRecognize workflow
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// A simple streamingRecognize workflow, | |
// tested with Node v15.0.1, by @noseratio | |
// https://stackoverflow.com/q/64565542/1768303 | |
import fs from 'fs'; | |
import path from "path"; | |
import url from 'url'; | |
import util from "util"; | |
import timers from 'timers/promises'; | |
import speech from '@google-cloud/speech'; | |
export {} | |
// need a 16-bit, 16KHz raw PCM audio | |
const filename = path.join(path.dirname(url.fileURLToPath(import.meta.url)), "sample.raw"); | |
const encoding = 'LINEAR16'; | |
const sampleRateHertz = 16000; | |
const languageCode = 'en-US'; | |
const request = { | |
config: { | |
encoding: encoding, | |
sampleRateHertz: sampleRateHertz, | |
languageCode: languageCode, | |
}, | |
interimResults: false // If you want interim results, set this to true | |
}; | |
// init SpeechClient | |
const client = new speech.v1p1beta1.SpeechClient(); | |
await client.initialize(); | |
// Stream the audio to the Google Cloud Speech API | |
const stream = client.streamingRecognize(request); | |
// log all data | |
stream.on('data', data => { | |
const result = data.results[0]; | |
console.log(`SR results, final: ${result.isFinal}, text: ${result.alternatives[0].transcript}`); | |
}); | |
// log all errors | |
stream.on('error', error => { | |
console.warn(`SR error: ${error.message}`); | |
}); | |
// observe data event | |
const dataPromise = new Promise(resolve => stream.once('data', resolve)); | |
// observe error event | |
const errorPromise = new Promise((resolve, reject) => stream.once('error', reject)); | |
// observe finish event | |
const finishPromise = new Promise(resolve => stream.once('finish', resolve)); | |
// observe close event | |
const closePromise = new Promise(resolve => stream.once('close', resolve)); | |
// we could just pipe it: | |
// fs.createReadStream(filename).pipe(stream); | |
// but we want to simulate the web socket data | |
// read RAW audio as Buffer | |
const data = await fs.promises.readFile(filename, null); | |
// simulate multiple audio chunks | |
console.log("Writting..."); | |
const chunkSize = 4096; | |
for (let i = 0; i < data.length; i += chunkSize) { | |
stream.write(data.slice(i, i + chunkSize)); | |
await timers.setTimeout(50); | |
} | |
console.log("Done writing."); | |
console.log("Before ending..."); | |
await util.promisify(c => stream.end(c))(); | |
console.log("After ending."); | |
// race for events | |
await Promise.race([ | |
errorPromise.catch(() => console.log("error")), | |
dataPromise.then(() => console.log("data")), | |
closePromise.then(() => console.log("close")), | |
finishPromise.then(() => console.log("finish")) | |
]); | |
console.log("Destroying..."); | |
stream.destroy(); | |
console.log("Final timeout..."); | |
await timers.setTimeout(1000); | |
console.log("Exiting."); |
Hi! I saw your stack overflow and google posts about this issue. I am following along this link to transcribe audio that comes from microphone input. I was interested in adapting your code to have a similar functionality with the
recorder
object the example uses to console.log the transcription only when the audio input from the client ends. The functionality I am trying to obtain is along the lines of this post but in nodejs.
Hi @ajgrewal, feel free to adapt this gist however you want, I'm glad if it's any useful. I'm though a bit wrapped up now with a bespoke project and I currently don't have time to take this code any further, I'm afraid.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi! I saw your stack overflow and google posts about this issue. I am following along this link to transcribe audio that comes from microphone input. I was interested in adapting your code to have a similar functionality with the
recorder
object the example uses to console.log the transcription only when the audio input from the client ends. The functionality I am trying to obtain is along the lines of this post but in nodejs.