Skip to content

Instantly share code, notes, and snippets.

@alexlnkp
Last active June 26, 2024 23:46
Show Gist options
  • Save alexlnkp/b0dc0aeee5a9b896a775af85b515c588 to your computer and use it in GitHub Desktop.
Save alexlnkp/b0dc0aeee5a9b896a775af85b515c588 to your computer and use it in GitHub Desktop.
Explaining F0 computation
import numpy as np
from matplotlib import pyplot as plt
from scipy.fft import fft, fftfreq
import scipy.io.wavfile as wav
import json
NOTES_MAP = json.load(open("notes_map.json", "r"))
WAVE_LOCATION = "rd.wav"
DURATION = 5 # Seconds
wav_file = open(WAVE_LOCATION, "rb")
SAMPLE_RATE, data = wav.read(wav_file)
yf = fft(data[: SAMPLE_RATE * DURATION])
xf = fftfreq(SAMPLE_RATE * DURATION, 1 / SAMPLE_RATE)
plt.plot(xf, np.abs(yf))
plt.xlim([0, 3e3])
# Set a threshold for the magnitude
threshold = 0.05 # Try reducing the threshold value
# Map frequencies to magnitude
y = np.abs(yf)
d = {}
for i in range(0, len(y)):
if xf[i] > 0:
d[f"{xf[i]}"] = y[i]
# Sort the dict so highest frequencies are at the top
d = sorted(d, key=d.get, reverse=True)
# Get the top 10 notes
bucket = []
for i in d:
if len(bucket) == 10:
break
i = round(float(i))
if i not in bucket:
bucket.append(i)
# Map to notes
notes = []
for i in bucket:
for note in NOTES_MAP:
note_freq = NOTES_MAP[note]
l_r = i - 4
h_r = i + 4
if l_r < note_freq and h_r > note_freq:
notes.append(note)
break
# Add labels to the plot
for i in bucket:
for note in NOTES_MAP:
note_freq = NOTES_MAP[note]
l_r = i - 4
h_r = i + 4
if l_r < note_freq and h_r > note_freq:
idx = np.argmin(np.abs(xf - note_freq))
if y[idx] > threshold:
plt.scatter(xf[idx], y[idx], c="r")
plt.annotate(
note,
(xf[idx], y[idx]),
textcoords="offset points",
xytext=(0, 10),
ha="center",
)
plt.show()

Overview

Let's take a look at a Fourier Transform of a wave using Python: fftp.py

This code calculates fourier transform using scipy and plots it using matplotlib. The output result for an audio of me just singing one consecutive note is here: Figure_1

The axis from 0.0 to 1.0 is the magnitude, the axis from 0 to 3000 is the frequency. The most prominent notes are labeled under the frequency they map to. The note mapping to frequency list

The logic is that, the dominant frequency (the one with highest amplitude) IS the F0, a.k.a. as the root note. The other notes and frequencies are overtones and noise that is introduced by a lot of factors. Where exactly overtones come from is a bit of a boring thing to cover, but if you wish - there's a lot of research papers on the matter :)

Please take a notice of this:

crop

The highest peak is a bit below the G4 note, however since it's still a valid frequency - it IS considered the F0 for the wave, even though it does not map to the standard 12-note scale system. The note labels are used for demonstration purposes.

Credit

Thanks to Sam Gallagher and their phenomenal article for the code base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment