Skip to content

Instantly share code, notes, and snippets.

@allquixotic
Last active December 11, 2022 02:33
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save allquixotic/f2e973fba89b5e9f1bcaf3337630b934 to your computer and use it in GitHub Desktop.
High Quality Pitch Shifting for Spotify on Windows using MSYS-MinGW64

The goal of this exercise is to get pitch shifting (changing the pitch of music up or down) working with Spotify on Windows 10 using native-compiled open source software. This solution does not require any VMs or Linux at all.

For this solution, the actual sound playback will be occurring on your Windows PC using your soundcard, but there are other components of the solution that do some audio processing. I'll explain each.

Requirements

  1. A Spotify Premium account
  2. The 64-bit build of MSYS2 from https://www.msys2.org/ -- get the default installer and install with default options.

Overview of the Working Solution

Working from the origin of the sound bits to the destination:

  1. Using the official Spotify client on a supported platform (e.g. a smartphone, Windows, Mac, Linux PC or Chromebook, etc.) the user will submit a request to use "Spotify Connect" to have playback occur through your spotifyd (https://github.com/Spotifyd/spotifyd) daemon.
  2. Audio files are streamed over the Internet to an instance of Spotifyd running on your Windows PC. Spotifyd speaks the Spotify client protocol, which is how it is able to play your songs from Spotify. It also speaks the Spotify Connect protocol, which means the playback controls (play, stop, pause, next, previous, and the specific song) can be controlled by a different device with a Spotify GUI client (even the Spotify web app) while the Spotifyd client actually receives the audio data and plays it. For the simplest setup, you can just use the official Spotify client for Windows to control spotifyd.
  3. Spotifyd plays the sound via PulseAudio to a TCP socket speaking the native PulseAudio protocol.
  4. PulseAudio receives the audio into a "null sink", which makes it go nowhere, but the audio can be read back from the null sink using the null sink's monitor -- a capture channel that has the same data as was written to the null sink.
  5. A GStreamer 1.x pipeline is launched using gst-launch-1.0 to read the data from the null sink's monitor, apply pitch shifting using the GStreamer libsoundtouch plugin from gst-plugins-bad, and feed it sraight to DirectSound, a native Windows sound API supported by GStreamer.
  6. DirectSound plays the audio using native Windows functionality and drivers.
  7. Note: This is a pretty "clean" audio pipeline with very little digital conversion or loss involved. The files are downloaded in a lossy format from Spotify, then decoded to PCM using spotifyd; the original PCM samples are then copied to PulseAudio, then to GStreamer, through the pipeline, and to the output DirectSound device without any further lossy re-encoding or resampling. A small amount of fidelity loss is possible if the Windows audio stack has to resample the audio under the hood (PA defaults to 44100 Hz 16-bit 2 channels; if your sound card has a different sample format, the audio will have to be converted.)
  8. Note 2: The pitch shifting algorithm of libsoundtouch is very high quality. It uses 32-bit floating point arithmetic for very accurate modification of the input 16-bit data without losing any fidelity. Actually the entire GStreamer pipeline ends up being 32-bit floating point because libsoundtouch requests it, so there is some lossless data type widening on both sides since the source and destination are 16-bit integers. This costs a bit of CPU but isn't a quality concern as long as the sample rate remains the same.

Build PulseAudio, Spotifyd and gst-plugins-good From Source In Msys for Mingw64 -- Optional

If you want to build the software I built on your own -- either to update it to a newer version, or just so you can say you did it -- follow these steps. If you just want to use my binaries, skip this section.

  1. Fire up your MSYS terminal with the start screen shortcut named "MSYS2 MinGW 64-bit".
  2. Run pacman -Syu
  3. When prompted, click the "X" on your terminal (don't hit Ctrl+C!)
  4. Run MSYS2 MinGW 64-bit again.
  5. Run pacman -Su
  6. Run pacman -S git nano mingw-w64-x86_64-libsndfile mingw-w64-x86_64-gstreamer mingw-w64-x86_64-gst-plugins-bad mingw-w64-x86_64-gst-plugins-good mingw-w64-x86_64-rust mingw-w64-x86_64-pkg-config mingw-w64-x86_64-speex mingw-w64-x86_64-speexdsp mingw-w64-x86_64-fftw mingw-w64-x86_64-orc autoconf make m4 to get all the dependencies you'll need.
  7. Download a release tarball of PulseAudio, then extract it to your home directory (using a tool like 7-zip or tar xvJf pulseaudio-13.0.tar.xz). For example: https://www.freedesktop.org/software/pulseaudio/releases/pulseaudio-13.0.tar.xz
  8. Download a release tarball of gst-plugins-good from here: https://gstreamer.freedesktop.org/src/gst-plugins-good/ It is recommended that you get the same version as the gst-plugins-good currently installed by pacman. Unpack it to your home directory.
  9. In MSYS, in your home directory, clone down spotifyd from Git: git clone https://github.com/spotifyd/spotifyd

Building PulseAudio

You must do this first before the other builds!

  1. Patch the PulseAudio source code to fix the things that break on Windows:
  2. cd into the pulseaudio source directory, then ./configure --disable-tests --disable-gconf --disable-gsettings --disable-gtk3 --disable-bluez5 --disable-esound --disable-alsa --disable-avahi --disable-openssl --disable-udev --disable-default-build-tests --disable-jack --disable-x11 --disable-manpages && make

Building Spotifyd

  1. Patch the Spotifyd source code to fix the things that break on Windows:

  2. Using yum or dnf, install the following packages from the package manager: ack, mingw64-gcc, mingw64-gcc-c++, make, autoconf, m4, cmake, mingw64-gettext, mingw64-speex, mingw64-libltdl, mingw64-orc, mingw64-orc-compiler, mingw64-pkg-config mingw64-speex-tools, mingw64-winpthreads, mingw64-zlib, mingw64-crt, mingw64-GConf2, mingw64-flac, mingw64-fftw, mingw64-dbus-glib, mingw64-binutils, mingw64-libffi, mingw64-libvorbis, mingw64-opus, mingw64-win-iconv, mingw64-gettext-static, mingw64-gdbm, mingw64-gdbm-static, mingw64-fftw-static, mingw64-libogg, git, wget, nano

  3. Download the latest pulseaudio release from https://pulseaudio.org -- for example I used PulseAudio 13.0.

  4. Download the latest libsndfile from GitHub: git clone https://github.com/erikd/libsndfile

  5. Compile libsndfile: cd libsndfile && mkdir ../libsndfile-build && mingw64-cmake ../libsndfile-build && sudo mingw64-make install (this will install libsndfile into your MinGW64 sys-root.)

  6. At least as of PulseAudio 13.0, it doesn't build from source cleanly on MinGW64 without some changes: 6a. In src/pulsecore/macro.h, replace the macro definition for pa_assert_cc(expr) on L236 with just #define pa_assert_cc(expr). This silently ignores compile-time assertions but the code seems to work fine. 6b. From the PulseAudio source tree root, run ack getuid src/ and go into each source file where getuid() is called, and simply comment out the checks where this function is called. It's usually in an if statement that does something like goto fail; so just comment out the condition entirely. I think it was in src/daemon/main.c and src/pulse/fork-detect.c but this can change in a different PulseAudio version. 6c. With the source fixed, run mingw64-configure --disable-gconf --disable-gsettings --disable-manpages && sudo mingw64-make install LDFLAGS='-lintl'. This compiles PulseAudio using the MinGW64 toolchain and installs it into your MinGW sys-root. 6d. Using the WSL2 connectivity between the host Windows filesystem and the WSL environment (e.g. /mnt/c or /c directory), copy over the files from /usr/x86_64-w64-mingw32/sys-root/mingw to your host. You can omit mingw/include entirely and probably a lot of other stuff in share but you will need the .dll and .exe files especially. You don't need anything related to compiling code, so .a, .o, .dll.a files aren't needed.

Windows PulseAudio From My Binaries

To accomplish the same thing as the above compile from source without the difficult compilation that requires editing C code, just download my binaries from Google Drive: https://drive.google.com/file/d/13otTRdZVgesggP4y6LNp0ElfTiv8MKUE/view?usp=sharing

Unzip it using 7-zip (https://7-zip.org) to somewhere like your Documents folder.

Linux Runtime: Dependencies Installation

  1. On your Linux box, open a terminal, and get rustup from https://rustup.rs/ and install the latest Rust stable. Restart your terminal app.
  2. Install needed packages from your distro's package manager (apt for Debian and derivatives; yum for Red Hat / Fedora derivatives): 2a. For compilation of spotifyd, you'll need libasound2-dev (the ALSA client development files), libpulse-dev (the PulseAudio client development files), and possibly some other libraries I already had that it didn't error out on. Read the errors if it fails to build. On Red Hat distros the package names will be more like pulseaudio-devel and on Debian they'll follow the convention of libpulse-dev. 2b. Clone spotifyd from git: git clone https://github.com/Spotifyd/spotifyd && cd spotifyd 2c. Try (repeatedly if necessary, after installing dependent native libraries and development files) to compile spotifyd with pulseaudio support: cargo build --release --features "pulseaudio_backend" You will need a C compiler and the usual base libraries and headers. Once successful, run sudo cp target/release/spotifyd /usr/local/bin 2d. Install pulseaudio the daemon. The version is not too important. In 99% of distros the package name is just pulseaudio 2e. You'll need GStreamer 1.x "tools" package, "plugins-good", "plugins-bad", and the pulseaudio plugin for gstreamer. On Debian Testing and recent Ubuntu, these packages are: gstreamer1.0-tools, gstreamer1.0-pulseaudio, gstreamer1.0-plugins-good, and gstreamer1.0-plugins-bad.

Linux Runtime Environment: Configuration

  1. For the Linux-side PulseAudio, create the file ~/.config/pulse/default.pa with the following contents:
#!/usr/bin/pulseaudio -nF
.fail
load-module module-switch-on-port-available
load-module module-native-protocol-unix
load-module module-rescue-streams
load-module module-null-sink sink_name=output
load-module module-always-sink
load-module module-intended-roles
load-module module-suspend-on-idle
.ifexists module-console-kit.so
load-module module-console-kit
.endif
load-module module-filter-heuristics
load-module module-filter-apply
set-default-sink output
set-default-source output.monitor
  1. On the same VM, create the file ~/spotifyd.conf and replace anything in with the appropriate data --
[global]
username = <your Spotify username, don't use quotes or anything>
password = <your Spotify password, don't use quotes or anything>
backend = pulseaudio
# The name that gets displayed under the connect tab on
# official clients. Spaces are not allowed!
device_name = WSL2
# The audio bitrate. 96, 160 or 320 kbit/s. Only lower if your Internet connection has a small cap or is VERY slow.
bitrate = 320
# The director used to cache audio data. This setting can save
# a lot of bandwidth when activated, as it will avoid re-downloading
# audio files when replaying them.
# Note: The file path does not get expanded. Environment variables and
# shell placeholders like $HOME or ~ don't work!
cache_path = /home/<your Linux username>/spotifydcache
# If set to true, audio data does NOT get cached.
#no_audio_cache = false
# It sounds better to me with normalization off - no unnecessary attenuation
volume_normalisation = false
  1. Also on Linux, create the file /usr/local/bin/streamify (or any other name you want in a folder that's in your PATH, it doesn't matter, as long as you know where it is) with the following contents:
#!/bin/bash
if [ -z $1 ]; then
    export PITCH=1.0
fi
if [ -z $SERVER ]; then
    export SERVER=$(grep nameserver /etc/resolv.conf | awk '{print $2}')
fi
#Microseconds of buffering for the Windows PulseAudio server. That's 1 millionths of a second
export WINLATENCY=1000000
pulseaudio --daemonize=true
sleep 5
spotifyd --config-path ~/spotifyd.conf
sleep 5
screen -mdS pipeline gst-launch-1.0 pulsesrc ! pitch pitch=${PITCH} ! queue ! pulsesink buffer-time=${WINLATENCY} server=tcp:${SERVER}

A few notes about this script: 3a. If you are not using WSL2, the snippet $(grep nameserver /etc/resolv.conf | awk '{print $2}') is probably not what you want. You need to replace this with either a hard-coded IP address where your guest can directly access your host system (Windows) or write another script to dynamically get the IP address if it varies. If your solution is not WSL2, I have no idea how to do that without better understanding your setup, so I can't advise much here, sorry :( 3b. The variable WINLATENCY helps the PulseAudio client within GStreamer to figure out how much latency it needs to consistently send data to your Windows PulseAudio server, over TCP, without dropouts. This value could possibly need to be much higher if your system is often heavily loaded on the CPU, particularly old/slow, or has bad drivers that spend a lot of time lagging the kernel. You should try a took like LatencyMon to diagnose your DPC Latency problem if the default value of 1 second is not enough. On the other hand, maybe your system is very well-behaved and you can set this value lower and still get no dropouts. If you are getting dropouts beyond the first 5 seconds of playback, you need to increase this value a lot. Try doubling it for starters. 3c. If you set the WINLATENCY variable really high and you still have dropouts, it's possible your Linux VM's pulseaudio is also laggy. This shouldn't be the case, since it's just doing everything in-memory with UNIX domain sockets, which are extremely fast, but if for some reason your system is slower, you can add a buffer-time argument to the pulsesrc element of the pipeline as well. 3d. There is no real advantage of trying to minimize the WINLATENCY value except to save a tiny bit of memory, because this setup is designed for music streaming of pre-recorded audio, not live or real-time audio. Due to the latency inherent in the TCP connection (as well as the poor performance of PulseAudio on native Windows in general), it will not be possible to use this solution for gaming or other real-time audio processes. GStreamer also adds quite a bit of latency, so you'll be lucky if you manage to get a complete end to end latency below 1 second. 3e. When you invoke this script, you'll need to set the desired pitch of the audio as a decimal value as the first (and only) argument to the script. If you omit it, it defaults to 1.0, which doesn't modify the audio. 0.9 reduces the pitch by 10%; 1.1 increases the pitch by 10%, etc. You can get pretty precise if you want to fine-tune it. If you need to shift by an exact semitone value, there's a logarithmic equation in the source code of my program RBPitch on Launchpad (launchpad.net/rbpitch) to convert a desired semitone value into decimal and back.

  1. Make sure your new script created in Step 3 is executable: chmod +x /usr/local/bin/streamify

Windows-side Configuration

This assumes you have a copy of PulseAudio binaries (pulseaudio.exe and all DLL dependencies) on your Windows box.

  1. On your Windows box, open the file mingw\etc\pulse\default.pa in a text editor.
  2. Make sure default.pa has the following line: load-module module-waveout sink_name=output source_name=input. It might already be there!
  3. Also required line: load-module module-native-protocol-tcp auth-anonymous=true. All the other "default" lines that may be there are probably OK. If you get errors from one of the other modules (like console-kit or whatever), just comment out the load-module line for it. The only required modules are module-waveout and module-native-protocol-tcp. The rest is basically not even going to get used.
  4. In the file mingw\etc\pulse\daemon.conf make sure you have the line exit-idle-time = -1 and that it is not commented. Comments in this file appear as ; or # at the start of a line.
  5. Create a new batch file named run.bat in the directory mingw\bin in your PulseAudio distribution as follows:
RMDIR "%USERPROFILE%\.config\pulse" /S /Q
pulseaudio.exe

Note, I found the first command in this script was necessary because Windows PulseAudio has a bug where it often corrupts its saved state files on exit. This doesn't actually hurt anything; it just makes PulseAudio take slightly longer to start up.

If you compiled PulseAudio another way that didn't "bake in" the relative path to the configuration directory (which would be ../etc/pulse), you may have to explicitly specify the configuration file on the command line when invoking pulseaudio.exe. My distribution of PulseAudio 13.0 for Windows knows where to find its config files without any arguments.

Starting It Up!

  1. First, just run your run.bat script. You will probably get a Windows Firewall (or other firewall application) pop-up prompt. Tell it to accept the connection from pulseaudio.exe, then terminate the command window running pulseaudio. You shouldn't get the pop-up again.
  2. Start run.bat again. Wait about 10 seconds for PulseAudio to initialize.
  3. In WSL2, run your streamify script I had you create, or whatever you decided to name it. If you want pitch shifting, now is the time to pass it as an argument; e.g. streamify 0.95 or something.
  4. Start an official Spotify client or navigate to the Spotify web app.
  5. Under the Spotify Connect icon (next to the play button), pick your WSL2 device. The device name is configurable in spotifyd.conf but if you left it at what I set it to, it should be called WSL2. It may take 30 seconds or so for everything to initialize to the point where you can accomplish this step, so be patient.
  6. Try to play a song. You should hear it playing through your Windows PC's speakers or headphones (default Windows sound device). If you pitch shifted in step 3, the audio should be pitch shifted!

Bundling It Up - To-Do

Certainly you can create shortcuts to your scripts (run.bat, and even streamify indirectly by invoking wsl.exe) and put them on your start screen by placing shortcuts in the directory %USERPROFILE%\AppData\Roaming\Microsoft\Windows\Start Menu\Programs. You could even create a script that prompts you in a little GUI input box for the pitch, or even build a GStreamer application that dynamically can change the pitch during playback.

You could also decouple the GStreamer pipeline from the other steps on the Linux side, so you can leave the PulseAudio and spotifyd servers running but restart your GStreamer pipeline with different pitch values as you prefer.

There's a lot more work and R&D that can be done around this to make it snazzier. This is just a very DIY solution for getting something I've long wanted on Windows without any major compromises or gotchas. The audio quality is far better and dropouts are less common than when I used to try to do this using a VMware Workstation virtual sound device in a Linux guest - those would drop out whenever you even did a little bit of 3d rendering on your computer. A native PulseAudio server on the Windows side is actually quite good.

Notes

  1. More to be written later, but I just wanted to comment that I do not recommend the build of PulseAudio version 1.1 available on the pulseaudio.org wiki. It's bad. It's old. It's crufty. There's really no reason to use such an old version of PulseAudio, when it's not that painful to compile the latest for Windows. Although Windows is not a first-class platform supported by the PulseAudio developers, the latest version is way better on Windows than version 1.1 was. It's just barely good enough for production use, in my opinion.

  2. Just because you can use this solution with a network of computers doesn't mean that it's a good idea. In my opinion, the likelihood of packet loss (due to network congestion) is the main argument against using PulseAudio over a network, other than a loopback NIC for a virtual machine. For what it's worth, the virtual NIC of Hyper-V (and thus WSL2) is capable of 10 Gigabit, with latencies lower than 1 ms. Most networked computers are not going to get anything that good, and especially if it's a WAN, this is not going to be a great experience for you. So please try this with a local VM first.

https://gstreamer.freedesktop.org/src/gst-plugins-good/

@MiruhiMiruhi
Copy link

MiruhiMiruhi commented Nov 27, 2020

great tutorial thanks, pulse audio released a version 14, how do I build it using mysys2?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment