Created
February 17, 2019 19:34
-
-
Save sshh12/62c740b329229c7292f2a7b520b0b6f3 to your computer and use it in GitHub Desktop.
Live mic -> live melspectrogram plot
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import cv2 | |
import numpy as np | |
import pyaudio | |
import librosa | |
import librosa.display | |
import matplotlib.pyplot as plt | |
import time | |
rate = 16000 | |
chunk_size = rate // 4 | |
p = pyaudio.PyAudio() | |
stream = p.open(format=pyaudio.paFloat32, | |
channels=1, | |
rate=rate, | |
input=True, | |
input_device_index=1, | |
frames_per_buffer=chunk_size) | |
frames = [] | |
plt.figure(figsize=(10, 4)) | |
do_melspec = librosa.feature.melspectrogram | |
pwr_to_db = librosa.core.power_to_db | |
while True: | |
start = time.time() | |
data = stream.read(chunk_size) | |
data = np.fromstring(data, dtype=np.float32) | |
melspec = do_melspec(y=data, sr=rate, n_mels=128, fmax=4000) | |
norm_melspec = pwr_to_db(melspec, ref=np.max) | |
frames.append(norm_melspec) | |
if len(frames) == 20: | |
stack = np.hstack(frames) | |
librosa.display.specshow(stack, y_axis='mel', fmax=4000, x_axis='time') | |
plt.colorbar(format='%+2.0f dB') | |
plt.title('Mel spectrogram') | |
plt.draw() | |
plt.pause(0.0001) | |
plt.clf() | |
#break | |
frames.pop(0) | |
t = time.time() - start | |
print(1 / t) |
This is due to the way librosa.feature.melspectrogram
computes the spectrograms. You could try increasing the windows size (chunk_size
) to increase the consistence of the frames. I've found recomputing the spectrogram on a list of raw audio (rather than computing for each chunk) to work pretty well for this.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Excuse me, the realtime plot has inconsistent brightness at different time: when the recoded sound is loud, the brightness of this time segment will be darker, when the recoderd sound is silence, the brightness will be brighter.
I think this is strange because loud sound should be brighter.