Skip to content

Instantly share code, notes, and snippets.

@mmalex
Last active March 23, 2024 03:14
Show Gist options
  • Star 10 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mmalex/3a538aaba60f0ca21eac868269525452 to your computer and use it in GitHub Desktop.
Save mmalex/3a538aaba60f0ca21eac868269525452 to your computer and use it in GitHub Desktop.
optimising allpass reverbs by using a single shared buffer

TLDR: if you've got a bunch of delays in series, for example all-pass filters in a reverb, put them all in a single big buffer and let them crawl over each other for a perf win!

recently I was fiddling around with my hobby reverb code, in preparation for porting it onto a smaller/slower CPU. I'd implemented a loop-of-allpass filters type reverb, just like everybody else, and indeed, I basically had the classic 'OOP'ish abstraction of an 'allpass' struct that was, say, 313 samples long, and... did an allpass. on its own little float buffer[313]. (well, short integer, not float, but thats not relevant) I'll write out the code in a moment.

but then I was browsing the internet one night, as you do, and stumbled on this old post by Sean Costello of Valhalla DSP fame - noting the sad passing of Alesis founder and general all-round DSP legend, Keith Barr. https://valhalladsp.com/2010/08/25/rip-keith-barr/

It's worth a read just for his wonderful anecdote about the birth of the midiverb - which spawned the thought process that led to this note.

It also has a neat picture of exactly the sort of structure I'm talking about - a bunch of delays in a ring, with suitable feed forward and back loops to make them diffuse the sound as it travels round the loop. But also, if you read the hardware description, it makes it very clear that this was implemented in hardware as a single large circular buffer, not as lots of small ones. AHA!

now this is such an obvious idea, it's almost a nothing burger. Except, I'd never thought of it. so I had a quick look at all the reverbs on the internet ^H^H^H^H^H^H^H three random reverbs I know about on the internet - and only 1 used this trick. mutable instruments gets this right for clouds (of course they do! never doubt Emilie. OH! AND! I just found a comment that this code was developed on the FV-1 chip which was designed by... Keith Barr. so it makes even more sense that this code uses this trick) https://github.com/pichenettes/eurorack/blob/master/clouds/dsp/fx/reverb.h - note that there's a single '_write_ptr' in the context - whereas the teensy reverb (https://github.com/PaulStoffregen/Audio/blob/master/effect_reverb.h) and their copy of freeverb (https://github.com/PaulStoffregen/Audio/blob/master/effect_freeverb.cpp) both keep each allpass filter in its own little box, with separate read/write indices and all the attendant incrementing and wrapping. Which is what I was doing too. I'm not saying the teensy reverb is 'wrong', and this isnt about the particular reverb topologies or the sound - just that they (and I) are missing a trick.

So what am I actually talking about? well what I used to do was something along the lines of... (none of this code has been run, just for illustration)

  template <int N> struct AllPass { 
    float buf[N]={}; 
    int i=0; 
    const static float mu=0.5f;
    float doit(float x) { 
      float delayed=buf[i];
      buf[i] = x -= delayed * mu;
      if (++i == N) i=0; // this line is meh! a waste!
      return x * mu + delayed;
    }
  }
  // so lets say we have 3 allpasses. in reality you'll have MANY more. 10s or 100s with randomish lengths.
  AllPass<23> a1;
  AllPass<72> a2;
  AllPass<313> a3;
  float DoReverb(float x) {
    x=a1.doit(x);
    x=a2.doit(x);
    x=a3.doit(x);
    // etc
    return x;
  }

and that all looks neat and wotnot but when the actual DSP-valuable cost of 'doit' is just a couple of multiply-adds, all that index fiddling is just waste. basically you're paying just to keep the allpass looping inside its own little private loop. silly encapsulation. why not do:

  float buf[MASK+1]; 
  int delaypos;
  const static float mu=0.5f;
  #define DoAllPass(N) { int j=(i+N)&MASK;float delayed=buf[j];buf[i]=x-=delayed*mu;x=x*mu+delayed; i=j; }
  float DoReverb(float x) {
    int i=(delaypos--)&MASK; // this is the only index maintenance we need!
    DoAllPass(23);
    DoAllPass(72);
    DoAllPass(313);
    // etc 
    return x;
  }

and basically it's exactly the same DSP code, except that we only need to store & maintain a single 'delaypos' integer instead of one index per allpass; instead of staying within their own little loops, the allpasses store their state in a little window of the single big buffer, and slowly 'scroll' through the buffer on each sample.

anyway, it's a small trick, but it was a shift in my way of thinking, that should have been obvious, but wasn't - probably because I Was mentally putting each allpass in its own little box, and then coding it that way... until I read how a hardware designer viewed it. and in retrospect, it's just bloody obvious, not least from the actual flow diagram picture. doh. gotta love those hardware designers.

PS if, like me, you like physical/visual analogy to these two approaches, the former approach treats each allpass as a little black box with an input and an output that can be chained together. It's fine, it's neat, it works, and you don't need to understand that inside is a self contained little loop of tape with a read and write head next to each other, delaying a signal. The latter approach effectively pops open each box, unfolds out the tape and lays out each box, left to right, one after the other, just like the flow diagram in the original blog post. I think of each write head on the left, read head a bit to the right, and then the next allpass, and the next, from left to right, with a single long strip of tape running through them all (perhaps, its all the little loops snipped open and joined together). functionally (musically?) they are completely identical, but now we've opened it all out, we can see that we only need one single tape running through the whole arrangement; and the tape slides to the right (and around in a giant loop) under the fixed heads.

So, the efficiency gain is really just about reducing the book keeping of having lots of bits of tape, down to a single loop. That's how I'm thinking about it, anyway.

@supersynthesis
Copy link

this is rad, thank you! I have implemented this idea alongside fractional delays for modulatable delays and all-pass delays. works like a charm. I now see you are the Plinky guy, Plinky is awesome :)

I had my man ChadGPT help me visualize it in python with a matplotlib animation. Have a peep.

https://github.com/supersynthesis/Shared_Delays/tree/main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment