Skip to content

Instantly share code, notes, and snippets.

@rp4ri
Last active March 12, 2023 06:43
Show Gist options
  • Save rp4ri/fc4878c8b6daacccad359ba9538ac44b to your computer and use it in GitHub Desktop.
Save rp4ri/fc4878c8b6daacccad359ba9538ac44b to your computer and use it in GitHub Desktop.
Audio Processing Useful code snippets

Audio Processing useful code snippets

1. Librosa

Installation

pip install librosa

Importing and loading audio

filename = 'path/to/audio_file.wav'
y, sr = librosa.load(filename)

Plotting the waveform

import librosa.display
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 4))
librosa.display.waveplot(y, sr=sr)
plt.show()

Extracting features (spectrogram)

A representation of the audio signal's frequency content over time.

import numpy as np

n_fft = 2048
hop_length = 512
stft = np.abs(librosa.stft(y, n_fft=n_fft, hop_length=hop_length))
spectrogram = librosa.amplitude_to_db(stft, ref=np.max)

Plotting the spectrogram

plt.figure(figsize=(12, 4))
librosa.display.specshow(spectrogram, sr=sr, hop_length=hop_length, x_axis='time', y_axis='log')
plt.show()

Extracting features (MFCCs)

Mel-Frequency Cepstral Coefficients (MFCCs) is a set of coefficients that summarize the audio signal's spectral envelope.

mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)

Plotting the MFCCs

plt.figure(figsize=(12, 4))
librosa.display.specshow(mfccs, sr=sr, x_axis='time')
plt.show()

Extracting features (chroma)

A representation of the audio signal's harmonic content.

chroma = librosa.feature.chroma_stft(y=y, sr=sr)

Plotting the chroma

plt.figure(figsize=(12, 4))
librosa.display.specshow(chroma, sr=sr, x_axis='time')
plt.show()

2. PyDub

Installation

pip install pydub

Importing and loading audio

from pydub import AudioSegment

audio = AudioSegment.from_file("audio_file.mp3")

Playing the audio

audio.play()

Converting to wav

audio.export("converted_audio_file.wav", format="wav")

Get the duration of the audio

length_in_ms = len(audio)

Trim the audio

trimmed_audio = audio[5000:15000]

Get information about the audio

info = audio.info

Increase or decrease volume of audio file

# Increase the volume by 10 dB
increased_volume_audio = audio + 10

# Decrease the volume by 10 dB
decreased_volume_audio = audio - 10

3. PyAudioAnalysis

Installation

pip install PyAudioAnalysis

Importing and loading audio

from PyAudioAnalysis import audioBasicIO
[Fs, x] = audioBasicIO.readAudioFile("path/to/file.wav")

Feature extraction

from PyAudioAnalysis import audioFeatureExtraction
F, f_names = audioFeatureExtraction.stFeatureExtraction(x, Fs, 0.050*Fs, 0.025*Fs)

Classification

from PyAudioAnalysis import audioTrainTest as aT
aT.featureAndTrain(["class1_folder", "class2_folder", ...], 1.0, 1.0, aT.shortTermWindow, aT.shortTermStep, "svm", "svm_model", False)

Segmentation

from PyAudioAnalysis import audioSegmentation as aS
segments = aS.speakerDiarization("path/to/file.wav", 4, "path/to/output/dir/")

Beat extraction

from PyAudioAnalysis import audioAnalysis
bpm, beats, beats_confidence = audioAnalysis.beatExtraction("path/to/file.wav", 0.0)

4. Madmom

Installation

pip install madmom

Importing and loading audio

import madmom

audio_file = madmom.audio.signal.Signal('path/to/audio_file.wav')

Feature extraction

import madmom

audio_file = madmom.audio.signal.Signal('path/to/audio_file.wav')
rms = madmom.audio.feature_extraction.rms(audio_file)

Audio processing

import madmom

audio_file = madmom.audio.signal.Signal('path/to/audio_file.wav')
filtered_audio = madmom.audio.filtering.highpass_filter(audio_file)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment