Skip to content

Instantly share code, notes, and snippets.

@cool-RR

cool-RR/x.md Secret

Created October 15, 2022 19:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cool-RR/12b0fc8106f6df18925de632f2b6e39f to your computer and use it in GitHub Desktop.
Save cool-RR/12b0fc8106f6df18925de632f2b6e39f to your computer and use it in GitHub Desktop.

As a regular programmer, I got really into queuing theory thinking I was going to learn secrets of performance tuning, then was slightly disappointed.

But it turns out the simple parts of it go a long way! Eg at work we have a single deployment queue for a monorepo. At first approx this is an MD1 queue (deploy job takes roughly the same amount of time every time, though arrivals are actually way spikier than poisson), and realized that wait time was inversely proportional to total utilization.

While the infra team was saying “we do X deploys per day out of 4X and are only at 25% capacity”, I realized even hitting 2X would more than double the already bad wait time.

What happened is that a few initiatives were under way to increase capacity, but then out of nowhere the queue got log jammed (bc of high arrival rate variability) and we had to switch to Gitlab merge trains, which run CI concurrently on the optimistic result of merging. I wrote about it here: https://engineering.outschool.com/posts/doubling-deploys-git...

I’m planning on writing a blog post about the math of CI/CD deploy queues as G/D/1 queues.

For a programmer’s view of queuing theory and great performance testing foundations, I highly recommend “Analysing Computer system performance with Perl:PDQ” (don’t worry about the Perl, the book is very relevant) http://www.perfdynamics.com/iBook/ppa_new.html, which shows examples of queues inside computer systems and how to model them. The author has a nice little library to model different computer systems you come across.

I liked realizing that the dreaded “coordinated omission” problem in load testing (when you can only generate X rps and the server can handle more than that, your numbers are bunk) is actually when you think you are modeling an open system but you don’t have enough resources and end up seeing the closed system behaviour. In fact, queues can tell you how likely you are to see X second latency at Y load, and knowing that can help you make good decisions.

But here's something more interesting: is it worth modeling things as queues when you already have tools that measure X and make sure it is not above some threshold? You might not even need queues, which are generalizations of systems.

To look at queuing in other domains, read the story of the Santa Cruz Tacotron: https://ai.googleblog.com/2017/12/tacotron-2-generating-human-s....

The Santa Cruz Tacotron is an end to end text to speech system that can generate human speech. It overcomes some of the issues with previous models by incorporating more realistic synthesized speech examples, rather than just trying to learn from the text.

It turns out the human speech signal is a lot more like music than like traditional speech. And music has a lot of silence in it!

So the model has an encoder that turns text into a sequence of phonemes (a “musical note”), a decoder that turns the phoneme sequence into a waveform, and a post-processing network that adds things like reverb.

The key insight is that the waveform prediction problem is actually a sequence to sequence problem, where the input is a sequence of phonemes and the output is a sequence of samples. This means that the Tacotron can be trained using an attention-based recurrent neural network, which is very powerful.

The Tacotron is actually a lot like a queue: it has an input (the text) and an output (the waveform). And just like a queue, it can get backed up if the input arrives too fast or if the output is slow.

In fact, the Tacotron is really good at modeling queues! When the input is a sequence of phonemes, the output is a sequence of samples, and the Tacotron can be trained using an attention-based recurrent neural network, which is very powerful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment