Skip to content

Instantly share code, notes, and snippets.

@korakot
Last active April 7, 2024 14:35
Show Gist options
  • Star 42 You must be signed in to star a gist
  • Fork 10 You must be signed in to fork a gist
  • Save korakot/c21c3476c024ad6d56d5f48b0bca92be to your computer and use it in GitHub Desktop.
Save korakot/c21c3476c024ad6d56d5f48b0bca92be to your computer and use it in GitHub Desktop.
Record audio in Colab using getUserMedia({ audio: true })
# all imports
from IPython.display import Javascript
from google.colab import output
from base64 import b64decode
from io import BytesIO
!pip -q install pydub
from pydub import AudioSegment
RECORD = """
const sleep = time => new Promise(resolve => setTimeout(resolve, time))
const b2text = blob => new Promise(resolve => {
const reader = new FileReader()
reader.onloadend = e => resolve(e.srcElement.result)
reader.readAsDataURL(blob)
})
var record = time => new Promise(async resolve => {
stream = await navigator.mediaDevices.getUserMedia({ audio: true })
recorder = new MediaRecorder(stream)
chunks = []
recorder.ondataavailable = e => chunks.push(e.data)
recorder.start()
await sleep(time)
recorder.onstop = async ()=>{
blob = new Blob(chunks)
text = await b2text(blob)
resolve(text)
}
recorder.stop()
})
"""
def record(sec=3):
display(Javascript(RECORD))
s = output.eval_js('record(%d)' % (sec*1000))
b = b64decode(s.split(',')[1])
audio = AudioSegment.from_file(BytesIO(b))
return audio
# all imports
from IPython.display import Javascript
from google.colab import output
from base64 import b64decode
RECORD = """
const sleep = time => new Promise(resolve => setTimeout(resolve, time))
const b2text = blob => new Promise(resolve => {
const reader = new FileReader()
reader.onloadend = e => resolve(e.srcElement.result)
reader.readAsDataURL(blob)
})
var record = time => new Promise(async resolve => {
stream = await navigator.mediaDevices.getUserMedia({ audio: true })
recorder = new MediaRecorder(stream)
chunks = []
recorder.ondataavailable = e => chunks.push(e.data)
recorder.start()
await sleep(time)
recorder.onstop = async ()=>{
blob = new Blob(chunks)
text = await b2text(blob)
resolve(text)
}
recorder.stop()
})
"""
def record(sec=3):
display(Javascript(RECORD))
s = output.eval_js('record(%d)' % (sec*1000))
b = b64decode(s.split(',')[1])
with open('audio.wav','wb') as f:
f.write(b)
return 'audio.wav' # or webm ?
@ryoppippi
Copy link

ryoppippi commented May 23, 2019

This gist is so much useful. I'll apply it to my projects.
Thank you so much!!

BTW, I think you should change
with open('audio.wav','rw+') as f:
to
with open('audio.wav','wb') as f:
coz the previous code doesn't work with the error
ValueError: must have exactly one of create/read/write/append mode

@contractorwolf
Copy link

yeah this is great, I just put this into my google colab notebook to record audio in the browser. Thanks @korakot

@konstantin-Sk
Copy link

+1
Thank you for your code.

@Satyampd
Copy link

Thanks a lot for this.
It is useful for speech recognition project.

@rbracco
Copy link

rbracco commented Jul 13, 2020

Thank you, this is incredibly useful.

@savitha91
Copy link

Can you please suggest alternative imports for these, when using in pycharm ide
from IPython.display import Javascript
from google.colab import output

@ErPokerino
Copy link

I tried this code, it works, but the saved file doesn't seems to have the structure of a wav file.
If I open the file with:

file = wave.open(filename)

I get this error message:

Error: file does not start with RIFF id

@rolambert
Copy link

Why do you not reference the original developer?

@korakot
Copy link
Author

korakot commented Jan 9, 2021

It's long ago, so I'm not sure where the original code come from. It's probably this one and a few others together. I also made many modifications, e.g. to use await instead of then(). @rolambert, if you find the real original source, I will add it then.

@mateoepalza
Copy link

Great!, thank you so much.

@Jorvan758
Copy link

This is awesome, but is there a way to add a countdown of the time that is left? I've been trying to do that for hours, but I'm not well versed enough using js in python 😔

@Jorvan758
Copy link

Nevermind, I just made this solution:

RECORD = """
const sleep = time => new Promise(resolve => {
setTimeout(resolve, time)
}, )
const b2text = blob => new Promise(resolve => {
const reader = new FileReader()
reader.onloadend = e => resolve(e.srcElement.result)
reader.readAsDataURL(blob)
})
var espacio = document.querySelector("#output-area")
var record = time => new Promise(async resolve => {
stream = await navigator.mediaDevices.getUserMedia({ audio: true })
recorder = new MediaRecorder(stream)
chunks = []
recorder.ondataavailable = e => chunks.push(e.data)
recorder.start()
var numerillo = (time/1000)-1
for (var i = 0; i < numerillo; i++) {
espacio.appendChild(document.createTextNode(numerillo-i))
await sleep(1000)
espacio.removeChild(espacio.lastChild)
}
recorder.onstop = async ()=>{
blob = new Blob(chunks)
text = await b2text(blob)
resolve(text)
}
recorder.stop()
})
"""

@the-psychedelic
Copy link

Hey, I am quite a noob in this but I copied your code to my google colab but the speech_recognition says that it is not an "audio data"
Please help.
Screenshot (58)

@peune
Copy link

peune commented Oct 1, 2021

Thanks :)

@hwsonnn
Copy link

hwsonnn commented Nov 24, 2021

@the-psychedelic Hi, I got a same problem.. Did you ever find a solution to the problem? I hope you found a way

@GeorgesBob
Copy link

Can I have some help because my problem still there since 2 days I'm trying to figure out a solution,
I did the same code as yours
thank you !
speech recognition erro

@orhanyl
Copy link

orhanyl commented May 22, 2022

were you able to find a solution?

@orhanyl
Copy link

orhanyl commented May 22, 2022

Hey, I am quite a noob in this but I copied your code to my google colab but the speech_recognition says that it is not an "audio data" Please help. Screenshot (58)

were you able to find a solution?

@diyism
Copy link

diyism commented Jan 19, 2023

ref: https://github.com/facebookresearch/WavAugment/blob/main/examples/python/WavAugment_walkthrough.ipynb

for invalid RIFF-header error:

!pip install ffmpeg-python
import ffmpeg

def fix_riff_header(binary):
  process = (ffmpeg
    .input('pipe:0')
    .output('pipe:1', format='wav')
    .run_async(pipe_stdin=True, pipe_stdout=True, pipe_stderr=True, quiet=True, overwrite_output=True)
  )
  output, err = process.communicate(input=binary)
  
  riff_chunk_size = len(output) - 8
  # Break up the chunk size into four bytes, held in b.
  q = riff_chunk_size
  b = []
  for i in range(4):
      q, r = divmod(q, 256)
      b.append(r)

  # Replace bytes 4:8 in proc.stdout with the actual size of the RIFF chunk.
  riff = output[:4] + bytes(b) + output[8:]
  return riff
def record(sec=3):
  display(Javascript(RECORD))
  s = output.eval_js('record(%d)' % (sec*1000))
  b = b64decode(s.split(',')[1])
  b = fix_riff_header(b)
  with open('audio.wav','wb') as f:
    f.write(b)
  audio = AudioSegment.from_file(BytesIO(b))
  return audio

@Adityaa24
Copy link

How can we modify seconds in record(sec=3) to record short as well as long clips?

@vivasvan1
Copy link

vivasvan1 commented Sep 7, 2023

This script is a basic example of how to continuously record audio until the user has finished speaking and then process that speech. You can customize and expand upon it for your specific use case or application.

# all imports
from IPython.display import Javascript
from google.colab import output
from base64 import b64decode
from io import BytesIO
!pip -q install pydub
from pydub import AudioSegment

RECORD = """
const sleep = time => new Promise(resolve => setTimeout(resolve, time))
const b2text = blob => new Promise((resolve, reject) => {
  const reader = new FileReader()
  reader.onloadend = e => resolve(e.target.result)
  reader.onerror = e => reject(new Error("Failed to read blob"))
  reader.readAsDataURL(blob)
})
var recordUntilSilence = time => new Promise(async (resolve, reject) => {
  let stream, recorder, chunks, blob, text, audioContext, analyser, dataArr, silenceStart, threshold = 50, silenceDelay = 2000
  try {
    stream = await navigator.mediaDevices.getUserMedia({ audio: true })
  } catch (err) {
    return reject(new Error("Failed to get media stream"))
  }
  audioContext = new AudioContext()
  const source = audioContext.createMediaStreamSource(stream)
  analyser = audioContext.createAnalyser()
  analyser.fftSize = 512
  dataArr = new Uint8Array(analyser.frequencyBinCount)
  source.connect(analyser)
  recorder = new MediaRecorder(stream)
  chunks = []
  recorder.ondataavailable = e => chunks.push(e.data)
  recorder.onstop = async () => {
    blob = new Blob(chunks)
    try {
      text = await b2text(blob)
      resolve(text)
    } catch (err) {
      reject(new Error("Failed to convert blob to text"))
    }
  }
  recorder.onerror = e => reject(new Error("Recorder error"))
  recorder.start()
  const checkSilence = () => {
    analyser.getByteFrequencyData(dataArr)
    const avg = dataArr.reduce((p, c) => p + c, 0) / dataArr.length

    if (avg < threshold) {
      if (silenceStart === null) silenceStart = new Date().getTime()
      else if (new Date().getTime() - silenceStart > silenceDelay) {
        recorder.stop()
        audioContext.close()
        return
      }
    } else {
      silenceStart = null
    }
    requestAnimationFrame(checkSilence)
  }
  silenceStart = null
  checkSilence()
})
console.log("JavaScript code executed successfully.")
"""

def record_until_silence():
  try:
    display(Javascript(RECORD))
    s = output.eval_js('recordUntilSilence()')
    b = b64decode(s.split(',')[1])
    audio = AudioSegment.from_file(BytesIO(b))
    return audio
  except Exception as e:
    print(f"An error occurred: {e}")
    return None

@LSRAO
Copy link

LSRAO commented Oct 11, 2023

# from https://gist.github.com/korakot/c21c3476c024ad6d56d5f48b0bca92be

from IPython.display import Javascript
from google.colab import output
from base64 import b64decode

# RECORD = """
# const sleep = time => new Promise(resolve => setTimeout(resolve, time))
# const b2text = blob => new Promise(resolve => {
#   const reader = new FileReader()
#   reader.onloadend = e => resolve(e.srcElement.result)
#   reader.readAsDataURL(blob)
# })
# var record = time => new Promise(async resolve => {
#   stream = await navigator.mediaDevices.getUserMedia({ audio: true })
#   recorder = new MediaRecorder(stream)
#   chunks = []
#   recorder.ondataavailable = e => chunks.push(e.data)
#   recorder.start()
#   await sleep(time)
#   recorder.onstop = async ()=>{
#     blob = new Blob(chunks)
#     text = await b2text(blob)
#     resolve(text)
#   }
#   recorder.stop()
# })
# """
RECORD = """
const sleep = time => new Promise(resolve => {
setTimeout(resolve, time)
}, )
const b2text = blob => new Promise(resolve => {
const reader = new FileReader()
reader.onloadend = e => resolve(e.srcElement.result)
reader.readAsDataURL(blob)
})
var espacio = document.querySelector("#output-area")
var record = time => new Promise(async resolve => {
stream = await navigator.mediaDevices.getUserMedia({ audio: true })
recorder = new MediaRecorder(stream)
chunks = []
recorder.ondataavailable = e => chunks.push(e.data)
recorder.start()
var numerillo = (time/1000)-1
for (var i = 0; i < numerillo; i++) {
espacio.appendChild(document.createTextNode(numerillo-i))
await sleep(1000)
espacio.removeChild(espacio.lastChild)
}
recorder.onstop = async ()=>{
blob = new Blob(chunks)
text = await b2text(blob)
resolve(text)
}
recorder.stop()
})
"""
def record(sec, filename='audio.wav'):
    display(Javascript(RECORD))
    print("before s")
    s = output.eval_js('record(%d)' % (sec))
    print(s)
    b = b64decode(s.split(',')[1])
    with open(filename, 'wb+') as f:
        f.write(b)

audio = 'audio.wav'
second = 5
print(f"Speak to your microphone {second} sec...")
record(1, audio)
print("Done!")


import librosa
import librosa.display
speech, rate = librosa.load(audio)



librosa.display.waveshow(speech, sr=rate)

import matplotlib.pyplot as plt
plt.show()

import pysndfile
pysndfile.sndio.write('audio_ds.wav', speech, rate=rate, format='wav', enc='pcm16')

from IPython.display import display, Audio
display(Audio(speech, rate=rate))

I am executing the above code. But the cell doesn't stop executing. And the output till now is as follows:

Speak to your microphone 5 sec...

before s

It doesn't change even if I use the commented section instead.

I am executing this in jupyter notebook locally. Is that the problem? or is it something else.

@HoangGhjk
Copy link

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment