straylark/pulseaudio-systems-programming.md Secret

## pulseaudio-systems-programming.md

      
    Raw
  

              pulseaudio-systems-programming.md
            
          
    bad systems programming in pulseaudio: untagged unions, fs races, edge cases, and the docs that don't mention them

First, some context. I'm poor and have an aging laptop whose sound card died.
I want to keep listening to audio from it, so I figured I'd set up pulseaudio to stream decoded audio to my phone.
I don't use pulse normally ("PULSE_SERVER=localhost" suffices to convince most programs that auto-probe pulse to not launch their own server), but I want to be able to run a pulseaudio daemon as my user and send packets across the network without privileges.
To complicate manners, I can't run a pulseaudio server on my phone, so the native pulse networking is right out.
But eventually I found a working configuration using RTP to stream audio across the LAN, and although it has ~1s latency, it does work:
vlc on the target device can play the stream at rtp://@:47777.
how to run pulseaudio

My journey looked like this:

google for a pre-built solution. There were a few articles and scattered docs/apocrypha, but nothing clear and straightforward.
alternate between passing and not passing the -n flag to pulse, which would work or not depending on factors I couldn't determine that changed from time to time. When not passing -n, I needed to load module-native-protocol-unix, but sometimes this would work and other times it wouldn't.
Loading this module listens for client applications on a unix-domain socket at /run/user/$UID/pulse/native.
realize that the pulseaudio module-rtp-send module only accepts IP addresses, not hostnames. I'm conflicted about this. It's easy to resolve hostnames and almost all networking software does it automatically. But on the other hand in a connectionless context it isn't obvious when to re-resolve hostnames as DNS mappings change. Even so, resolving domains zero times is strictly less useful than resolving them once. The plugin seems oriented toward multicase addresses, but I could never get this working and it works fine for my use-case with a regular IP. Regardless, for lack of ability to trigger on changes to hostname->IP mapping (this problem deserves its own article), I may need to generate a new configuration every time I run pulse.
try to pipe the script to pulseaudio's standard input with the -C flag, which doesn't work because pulse reopens stdin and doesn't see the piped script.
try to pass the script to -F with process substitution (<(echo "$script")). This doesn't work: pulse complains of inabiility to stat() /dev/fd/N (bash) or /proc/self/fd/N (zsh).
create temporary file and pass its path to -F. This is frustratingly ugly: more moving parts are involved, it's painful to remove tempfiles in a foolproof way in shell scripts, and the whole approach is full of race conditions, but it does basically work.
try again to figure out why pulseaudio -n -F <(echo "$script") doesn't work when typing stat <(echo "$script") into the same shell works fine.

The precise error output from using process substitution with pulseaudio's -F flag is the following:
W: [pulseaudio] cli-command.c: stat('/proc/self/fd/12'): No such file or directory
E: [pulseaudio] main.c: Failed to initialize daemon.

So I check out the sources and open cli-command.c.
There's one call to stat() in the file, in pa_cli_command_execute_line_stateful; simple enough.
But this function's name and signature don't make it obvious where the filename to execute would come in:
int pa_cli_command_execute_line_stateful(pa_core *c, const char *s, pa_strbuf *buf, bool *fail, int *ifstate);
It seems that this function actually does a fair bit of string-processing and then dispatches to pa_cli_command_execute_file to actually handle opening and reading script files.
In particular, it takes its s argument, strips leading whitespace, checks for emptiness or a beginning-of-line comment, handles lines that are "meta-commands" (conditionals, include directives, and others), and only if none of the other cases apply, tokenizes and interprets the string as a sequence of newline-separated commands.
While reading the whitespace-trimming and other parsing I was worried that the code might also be corrupting filenames that started with whitespace, but since the only case that calls out to pa_cli_command_execute_file is the one for .include directives, it seems something must be prefixing the -F argument with ".include ".
For each .include directive, the code first stat()s the portion of the string after the prefix, and exits if it fails.
Otherwise, it checks if the file is a directory.
If it was (note, past tense; this is a TOCTTOU bug), it does a readdir(), filters for files with the ".pa" extension, sorts them alphabetically (a one-off O(n^2) insertion sort with the venerable "strcmp" as arbiter), then interprets them in order.
If the file wasn't a directory, it passes the path to pa_cli_command_execute_file.
So, in addition to files, the -F flag also has undocumented support for directories! This is noted in the docs for .include in pulse-cli-syntax(5), but a user could never tell .include was related to -F without reading the source code.
In my playing with shell scripting earlier, I stumbled upon another bug/feature: it turns out you can pass scripts as literals to -F, if you give the script a blank first line.
How does this work?
Now that we've read the code, we can figure it out:
We know that first pulseaudio prepends ".include " to the -F argument to generate a temporary script to interpret.
Looking at the stat syscall actually performed (with strace or by looking at the stat warnings printed for nonexistent filenames), we can see that pulseaudio prepends the directive to the absolute path derived from the first line of the argument for -F.
Most directories don't have any .pa files, so this is usually a nop.
To make it a guaranteed nop, we can start our -F argument with /dev/null.
Then the rest of the argument to -F gets interpreted verbatim as a pulse script.
So we don't need temporary files or /dev/fd/N to pass in a generated script on the command-line; we can just use good old argv.
The resulting script lives here.
takeaways


it's still a big pain and a very use-case-dependent process to access hardware connected over IP networks. If only Inferno had won. (Don't even get me started on IP and DNS; read up on RINA if you're interested.)
temporary files are easy to do wrong, the filesystem is a complicated, leaky abstraction (imo, a mistake in its entirety, but that's also another article), and shells are bad at passing inputs to programs, arguably their primary purpose.
document your fucking software. Just like you consider which tests cover each line of code, you should assess which docs explain each line of code.
don't make one argument (or function, etc.) do multiple, unrelated things.
Being "clever" just means consumers of your interface have to be equally clever to invert your logic and predict which underlying operation will be performed.
a corollary: when you do multiplex distinct pieces of functionality, use an explicit disambiguator.
This rule of thumb shows up repeatedly across computer science: discriminated unions (aka sum types, or the categoric notion of coproduct), avoiding parser ambiguity (and the langsec movement), etc.
always treat filenames/paths as binary blobs. The handling in pulseaudio is broken for filenames containing newlines. There's no way short of userspace-breaking kernel modifications to prevent "weird" paths from happening, so your software must be able to handle any path. The only delimiter byte available for raw paths is NUL.
calling stat() is often a mistake. Attempting to use a file for the desired purpose is generally the right way to find out if that file can be used that way (e.g. open() or readdir()). Most operations should be performed on an already-opened file descriptor, not on a filesystem path, to avoid races. This insight is captured by the notion of capabilities, which modern operating systems (including Linux) recognize as having safer and more useful semantics than names and ACLs/"permissions". (Note, of course, that Linux' capabilities(7) is actually an extended permissions system, not a capability-based one.)

how to use it

(on audio-playing machine)
$ git clone https://github.com/straylark/pulse-rtp-to
$ ./pulse-rtp-to/pulse-rtp-to target-host

(on target-host)
$ vlc rtp://@:47777

(back on audio-playing machine)
$ PULSE_SERVER=/run/user/$UID/pulse/native mpv song.mp3