Skip to content

Instantly share code, notes, and snippets.

Last active March 12, 2023 06:43
Show Gist options
  • Save rp4ri/fc4878c8b6daacccad359ba9538ac44b to your computer and use it in GitHub Desktop.
Save rp4ri/fc4878c8b6daacccad359ba9538ac44b to your computer and use it in GitHub Desktop.
Audio Processing Useful code snippets

Audio Processing useful code snippets

1. Librosa


pip install librosa

Importing and loading audio

filename = 'path/to/audio_file.wav'
y, sr = librosa.load(filename)

Plotting the waveform

import librosa.display
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 4))
librosa.display.waveplot(y, sr=sr)

Extracting features (spectrogram)

A representation of the audio signal's frequency content over time.

import numpy as np

n_fft = 2048
hop_length = 512
stft = np.abs(librosa.stft(y, n_fft=n_fft, hop_length=hop_length))
spectrogram = librosa.amplitude_to_db(stft, ref=np.max)

Plotting the spectrogram

plt.figure(figsize=(12, 4))
librosa.display.specshow(spectrogram, sr=sr, hop_length=hop_length, x_axis='time', y_axis='log')

Extracting features (MFCCs)

Mel-Frequency Cepstral Coefficients (MFCCs) is a set of coefficients that summarize the audio signal's spectral envelope.

mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)

Plotting the MFCCs

plt.figure(figsize=(12, 4))
librosa.display.specshow(mfccs, sr=sr, x_axis='time')

Extracting features (chroma)

A representation of the audio signal's harmonic content.

chroma = librosa.feature.chroma_stft(y=y, sr=sr)

Plotting the chroma

plt.figure(figsize=(12, 4))
librosa.display.specshow(chroma, sr=sr, x_axis='time')

2. PyDub


pip install pydub

Importing and loading audio

from pydub import AudioSegment

audio = AudioSegment.from_file("audio_file.mp3")

Playing the audio

Converting to wav

audio.export("converted_audio_file.wav", format="wav")

Get the duration of the audio

length_in_ms = len(audio)

Trim the audio

trimmed_audio = audio[5000:15000]

Get information about the audio

info =

Increase or decrease volume of audio file

# Increase the volume by 10 dB
increased_volume_audio = audio + 10

# Decrease the volume by 10 dB
decreased_volume_audio = audio - 10

3. PyAudioAnalysis


pip install PyAudioAnalysis

Importing and loading audio

from PyAudioAnalysis import audioBasicIO
[Fs, x] = audioBasicIO.readAudioFile("path/to/file.wav")

Feature extraction

from PyAudioAnalysis import audioFeatureExtraction
F, f_names = audioFeatureExtraction.stFeatureExtraction(x, Fs, 0.050*Fs, 0.025*Fs)


from PyAudioAnalysis import audioTrainTest as aT
aT.featureAndTrain(["class1_folder", "class2_folder", ...], 1.0, 1.0, aT.shortTermWindow, aT.shortTermStep, "svm", "svm_model", False)


from PyAudioAnalysis import audioSegmentation as aS
segments = aS.speakerDiarization("path/to/file.wav", 4, "path/to/output/dir/")

Beat extraction

from PyAudioAnalysis import audioAnalysis
bpm, beats, beats_confidence = audioAnalysis.beatExtraction("path/to/file.wav", 0.0)

4. Madmom


pip install madmom

Importing and loading audio

import madmom

audio_file ='path/to/audio_file.wav')

Feature extraction

import madmom

audio_file ='path/to/audio_file.wav')
rms =

Audio processing

import madmom

audio_file ='path/to/audio_file.wav')
filtered_audio =
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment