Skip to content

Instantly share code, notes, and snippets.

Last active October 2, 2022 22:36
Show Gist options
  • Save fasiha/957035272009eb1c9eb370936a6af2eb to your computer and use it in GitHub Desktop.
Save fasiha/957035272009eb1c9eb370936a6af2eb to your computer and use it in GitHub Desktop.
Pitch (fundamental frequency) detection using (1) harmonic product spectrum, (2) Blackman-Tukey spectral estimator, and (3) Welch spectral estimator.

Experimenting with pitch detection and spectral estimators

See the question and discussion on StackOverflow: How to get the fundamental frequency using Harmonic Product Spectrum?.

We’re trying to estimate the fundamental frequency of a voiced A4 note (440 Hz). (See the question for link to audio clip.)

Harmonic product spectrum

Result: full data, 0 to 2 KHz

0 to 2000 Hz, full data

Results on 8192-sample chunks of full data

0 to 1 KHz

0 to 1000 Hz

0 to 500 Hz

0 to 500 Hz

Around 220 Hz

Around 220 Hz

Blackman-Tukey spectral estimate

In their Spectral Analysis of Signals, Stoica and Moses cite the Blackman-Tukey spectral estimator as a generalization of many other spectral estimators, including the Welch estimator and the periodogram. It is parameterized by a window, whose length and shape governs the characteristics of the spectral estimate, specifically, its variance (how up-and-down the plot jumps) and its resolution (how wide the peaks are).

Here’s the Blackman-Tukey spectral estimate of the entire audio clip using an 8K Hamming window.

Blackman-Tukey estimate with 8K-Hamming window

The code to generate this is below. As with the Welch estimate (see below), the Blackman-Tukey estimate does not attenuate the 440 Hz harmonic as much as the harmonic product spectrum method above.

Welch’s power spectral density estimate

Uses scipy.signal.welch to estimate the power spectral density. Again, in contrast to HPS, Welch’s method as well as the short-time Fourier transform below both leave the 440 Hz harmonic at almost the same level as the 220 Hz harmonic.

Welch’s method

Stoica and Moses describe Welch’s method as an approximation to the Blackman-Tukey estimator that can be efficiently evaluated using the FFT. Even though I tried to match the parameters to the Blackman-Tukey example (8192-bin Hamming window), the two spectral estimates are quite different, even qualitatively (compare peak at 880 Hz).

Code: HPS, Blackman-Tukey, and Welch spectral estimators in Python

import numpy as np
import numpy.fft as fft
import as wavfile
from scipy.signal import hamming

## Read in the audio file:
# Convert both raw samples and sample rate to floats, then convert stereo to mono.
(fs, y) ='./a4me.wav')
y = y.astype(float) / 2**16 # normalize to -1 to 1
y = y[:,0] + y[:,1] # stereo to mono
fs = float(fs)

def sliding(x, Nwin, Noverlap=0, f=lambda x: x):
  """Apply a function over overlapping sliding windows with overlap.

  Given an iterator with N elements (a list, a Numpy vector, a range object,
  etc.), subdivide it into Nwin-length chunks, potentially with Noverlap samples
  overlapping between chunks. Optionally apply a function to each such chunk.

  Any chunks at the end of x whose length would be < Nwin are silently ignored.

  x : array_like
      Iterator (list, vector, range object, etc.) to operate on.
  Nwin : int
      Length of each chunk (sliding window).
  Noverlap : int, optional
      Amount of overlap between chunks. Noverlap must be < Nwin. 0 means no
      overlap between chunks. Positive Noverlap < Nwin means the last Noverlap
      samples of a chunk will be the first Noverlap samples of the next chunk.
      Negative Noverlap means |Noverlap| samples of the input will be skipped
      between each successive chunk.
  f : function, optional
      A function to apply on each chunk. The default is the identity function
      and will just return the chunks.

  l : list
      A list of chunks, with the function f applied.
  hop = Nwin - Noverlap
  return [f(x[i : i + Nwin])
          for i in range(0, len(x), hop)
          if i + Nwin <= len(x)]

def hps(x, numProd, Nfft=None, fs=1):
  """Harmonic product spectrum of a vector.

  This algorithm can be used for fundamental frequency detection. It evaluates
  the magnitude of the FFT (fast Fourier transform) of the input signal, keeping
  only the positive frequencies. It then element-wise-multiplies this spectrum
  by the same spectrum downsampled by 2, then 3, ..., finally ending after
  numProd downsample-multiply steps.

  Here, "downsampling a vector by N" means keeping only every N samples:
  downsample(v, N) = v[::N].

  Of course, at each step, a vector of data is multiplied by a vector *smaller*
  than it: the algorithm specifies that the extra elements at the end of the
  longer vector be ignored. This implies that the output will be ceil(len(x) /
  numProd) long, so at each step, we only consider this many elements.

  See Gareth Middleton, Pitch Detection Algorithms (2003) at
  (accessed September 2016).

  x : array_like
      Time-samples of data
  numProd : int
      Number of products to evaluate the harmonic product spectrum over.
  Nfft : int, optional
      The length of the FFT. Almost always this is greater than the length of x,
      with x being zero-padded before the FFT. This is helpful for two reasons:
      more zero-padding means more interpolation in the spectrum (a smoother
      spectrum). Also, FFT lengths with low prime factors (i.e., products of 2,
      3, 5, 7) are usually (much) faster than those with high prime factors. The
      default is Nfft = len(x). By way of example, if len(x) = 4001, this
      default might take much more time to run than Nfft = 4096, since 4001 is
      prime while 4096 is a power of 2.
  fs : float, optional
      The sample rate of x, in samples per second (Hz). Used only to format the
      returned vector of frequencies.

  y : array
      Spectrum vector with ceil(Nfft / (2 * numProd)) elements.
  f : array
      Vector of frequencies corresponding to the spectrum in y. Runs from 0 to
      roughly (fs / (2 * numProd)) Hz.
  Nfft = Nfft or x.size
  # Evaluate FFT. f is the frequencies corresponding to the spectrum xf
  f = np.arange(Nfft) / Nfft
  xf = fft.fft(x, Nfft)
  # Keep magnitude of spectrum at positive frequencies
  xf = np.abs(xf[f < 0.5])
  f = f[f < 0.5]
  N = f.size

  # Downsample-multiply
  smallestLength = int(np.ceil(N / numProd))
  y = xf[:smallestLength].copy()
  for i in range(2, numProd + 1):
    y *= xf[::i][:smallestLength]
  f = f[:smallestLength] * fs
  return (y, f)

# Parameters for HPS and the sliding window over data to apply it over
prod = 5
winlen = 8*1024
overlap = winlen // 2
Nfft = int(4 * 2**np.ceil(np.log2(winlen)))

# Run HPS on winlen-long chunks of the data.
hpsArr = np.array(sliding(y, winlen, Noverlap=overlap,
                          f=lambda x: hps(x * hamming(len(x), False),
                                          prod, Nfft)[0]))
# Extract the frequency and time vectors that this array of spectra corresponds to
hpsF = hps(y[:winlen], prod, Nfft)[1] * fs
hpsT = np.arange(hpsArr.shape[0]) / fs * (winlen - overlap)

# Display
db20 = lambda x: np.log10(np.abs(x)) * 20

def extents(f):
  """Convert evenly-spaced vector of sample locations to extents for `imshow`.

  If you want to use Matplotlib's `imshow` to visualize an array and need to
  specify the location of each pixel, you need to adjust the `extent` kwarg to

  # imshow(data, extent=extents(x) + extents(y))

  See for
  additional details.
  delta = f[1] - f[0]
  return [f[0] - delta/2, f[-1] + delta/2]

import matplotlib
import matplotlib.pyplot as plt['dark_background','ggplot'])

plt.imshow(db20(hpsArr), aspect='auto', interpolation='nearest',
           extent=extents(hpsF) + extents(hpsT), cmap='viridis')
plt.xlabel('frequency (Hz)')
plt.ylabel('time (seconds)')
plt.title('Harmonic product spectrum: # products={}, window length={}'.format(prod, winlen))

plt.xlim([0, 500])



# No chunking
def nextpow2(n):
  return int(2 ** np.ceil(np.log2(n)))

prod = 5
[hpsS, hpsF] = hps(y * hamming(len(y)), prod, nextpow2(y.size), fs)
plt.plot(hpsF, db20(hpsS))
plt.ylim([-100, 300])
plt.xlabel('frequency (Hz)')
plt.ylabel('spectrum, dB')
plt.title('Harmonic product spectrum: # products={}'.format(prod))

plt.gca().set_xticks(np.arange(0, 2000, 55))
plt.xlim([0, 1000])
plt.gca().set_xticks(np.arange(0, 2000, 110))
plt.xlim([0, 2000])

# Welch?
from scipy.signal import welch
(fWelch, pWelch) = welch(y, fs=fs, window='hamming', nperseg=8192, noverlap=4096)

plt.plot(fWelch, db20(pWelch))
plt.xlabel('frequency (Hz)')
plt.ylabel('spectrum estimate')
plt.title('Welch spectral estimate, 8192-Hamming window & 50% overlap')

plt.ylim([-200, -50])
plt.xlim([0, 2000])
plt.gca().set_xticks(np.arange(0, 2000, 110))

# Blackman-Tukey spectral esimator
from scipy.signal import correlate

def acorrBiased(y):
  """Obtain the biased autocorrelation and its lags
  r = correlate(y, y) / len(y)
  l = np.arange(-(len(y)-1), len(y))
  return r,l

# This is a port of the code accompanying Stoica & Moses' "Spectral Analysis of
# Signals" (Pearson, 2005):
def blackmanTukey(y, w, Nfft, fs=1):
  """Evaluate the Blackman-Tukey spectral estimator

  y : array_like
  w : array_like
      Window, of length <= y's
  Nfft : int
      Desired length of the returned power spectral density estimate. Specifies
      the FFT-length.
  fs : number, optional
      Sample rate of y, in samples per second. Used only to scale the returned
      vector of frequencies.

  phi : array
      Power spectral density estimate. Contains ceil(Nfft/2) samples.
  f : array
      Vector of frequencies corresponding to phi.

  P. Stoica and R. Moses, *Spectral Analysis of Signals* (Pearson, 2005),
  section 2.5.1. See for original
  Matlab code. See for book contents.
  M = len(w)
  N = len(y)
  if M>N:
    raise ValueError('Window cannot be longer than data')
  r, lags = acorrBiased(y)
  r = r[np.logical_and(lags >= 0, lags < M)]
  rw = r * w
  phi = 2 * fft.fft(rw, Nfft).real - rw[0];
  f = np.arange(Nfft) / Nfft;
  return (phi[f < 0.5], f[f < 0.5] * fs)

btWin = hamming(1024*8)
(pBT, fBT) = blackmanTukey(y, btWin, 16*1024, fs)

plt.plot(fBT, db20(pBT))
plt.xlabel('frequency (Hz)')
plt.ylabel('spectrum estimate')
plt.title('Blackman-Tukey spectral estimate, {}-Hamming window'.format(btWin.size))

plt.gca().set_xticks(np.arange(0, 2000, 110))
plt.xlim([0, 2000])


Matlab-based short-time Fourier transform analysis

Code prefixed with arf. namespace is freely available, e.g., STFT function.

[y fs] = audioread('a4me.wav');
w = 1024*4*8; [s.s, s.f, s.t]=arf.stft(y, mtl.signal.window.Hamming(w), 'noverlap', w/2, 'fs', fs);
arf.stft(y, mtl.signal.window.Hamming(w), 'noverlap', w/2, 'fs', fs);
arf.rescalec(50) % top 50 db
ylim([0 700])

figure;plot(s.f, db20(s.s))
xlabel('Frequency (Hz)')
title('STFT using 8K chunks and 50% overlap')
xlim([0 700]);ylim([-80 -20])
title('STFT using 8K chunks and 50% overlap')

For results with this algorithm, see the images attached to this gist below.

This file has been truncated, but you can view the full file.
View raw

(Sorry about that, but we can’t show files that are this big right now.)

Copy link

Ungihan commented Sep 2, 2016

Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment