Alex Evans mmalex

## allpass.md

      
              1 file
            
          
              0 forks
            
          
              1 comment
            
          
              10 stars
            
          
                mmalex
                / allpass.md
            
            
              Last active
              March 23, 2024 03:14
            
              
                optimising allpass reverbs by using a single shared buffer
              
          
    TLDR: if you've got a bunch of delays in series, for example all-pass filters in a reverb, put them all in a single big buffer and let them crawl over each other for a perf win!
recently I was fiddling around with my hobby reverb code, in preparation for porting it onto a smaller/slower CPU.
I'd implemented a loop-of-allpass filters type reverb, just like everybody else, and indeed, I basically had the classic 'OOP'ish abstraction of an 'allpass' struct that was, say, 313 samples long, and... did an allpass. on its own little float buffer[313]. (well, short integer, not float, but thats not relevant) I'll write out the code in a moment.
but then I was browsing the internet one night, as you do, and stumbled on this old post by Sean Costello of Valhalla DSP fame - noting the sad passing of Alesis founder and general all-round DSP legend, Keith Barr.
https://valhalladsp.com/2010/08/25/rip-keith-barr/
It's worth a read just for his wonderful anecdote about the birth of the midiverb - which spawned the thou

  
## pngencode.cpp
// by alex evans, 2011.  released into the public domain.
// based on a first ever reading of the png spec, it occurs to me that a minimal png encoder should be quite simple.
// this is a first stab - may be buggy! the only external dependency is zlib and some basic typedefs (u32, u8)
//
// VERSION 0.02! now using zlib's crc rather than my own, and avoiding a memcpy and memory scribbler in the old one
// by passing the zero byte at the start of the scanline to zlib first, then the original scanline in place. WIN!
//
// more context at http://altdevblogaday.org/2011/04/06/a-smaller-jpg-encoder/.
//
// follow me on twitter @mmalex http://twitter.com/mmalex

## range_coder.cpp
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <assert.h>
#include <vector>
// arithmetic coder/decoder by @mmalex based on the wikipedia page https://en.wikipedia.org/wiki/Range_coding as of July 2022
struct encdec_state {
    typedef uint16_t artype;
    const static int topshift = sizeof(artype)*8-8;
    artype low=0,range=~artype(0);

## bbo4lpzw.txt
bbo  notes
--
this is literally just the first time I've got the module compiling, so it's not any way 'done' its just 'a step closer', but at least I can send you a hex file now.
theres gonna be teething trouble coz Ive not tried a lot of this before!
but heres what I think:

FLASHING

to program it, you can use stm32cubeprogrammer, (or the ide), that you can get from https://www.st.com/en/development-tools/stm32cubeprog.html
you can either use DFU mode where you disconnect it from euro power, and plug a USB A<->USB A cable in to it. as it powers up, you may need to short JP1 which forces it into ST's DFU bootrom, windows should detect the USB device and stm32cube programmer should let you flash it via DFU mode. (I havent tried this yet, but will).

## allpasstest.cpp
#include "stdafx.h"
#include <Windows.h> // for performance counter
// quick test of the theory in https://gist.github.com/mmalex/3a538aaba60f0ca21eac868269525452
// we try running a simple impulse train (click every 4096 samples) through 6 allpasses with random lengths
// we time how long it takes to process 1 million samples, structuring the loop 3 ways:
//	- a sample at a time with self contained allpass structures,
//  - a sample at a time with a single big buffer
//  - a block at a time using self contained allpass structures, operating in place on a 256 sample buffer.
// using this naive code, on a single core of my AMD threadripper, with default release compile settings on visual studio 2015,
// I see

## midi2osc.cpp
/*
quick midi 2 osc hack by @mmalex, use at your own risk etc
reads midi messages, forward them to osc messages on your network.

barely tested but works for me in windows 7, compiled with msvc 2015 sp2

stick a midi2osc.ini file in the same folder as the exe to configure it, with lines like these:

IP=43.193.207.78
Port=9000

## gist:7917083
@mmalex:
the redis feedback thread has been interesting, and I share many of antirez's 'tastes' when it comes to wanting to create pragmatic software that is 'robust enough' for many use cases. OTOH I do think that redis is in an interesting place regarding how its users view (and potentially misinterpret) its data 'promises' , especially in cluster

I just wanted to give an example in case it is a useful datapoint in the spectrum being discussed for redis (pure memory store vs evolving into a disk cluster that might or might not make CP claims...)

TL;DR difference from redis cluster: I basically wanted not to have the distinction of master vs slave, and all the complexity that brings (having to do elections, split brain, etc) and instead build everything around merging between symmetrical peers - ie every node is equal, and they merge their differences on (re)connection. I think it's a really powerful paradigm, and not one that I've seen antirez talk much about with respect to redis. if you're interested in

## gist:7663290
// sillycocg.cpp : disgusting test code. loads a 320x180 rgb image from input.raw
// writes out a (fixed) photoshop-style 3 bytes per index palette file to ycocg.act
// you can load this in photoshop via image/mode/index then image/mode/color-table.../load...
// then it writes out an 8bpp indexed image to output.raw
// its kinda a hindsight is 2020 thing, the old 'fake truecolor' demos that did RGB stripes/
// patterns - I wondered what it would be like to do something like
// http://www.pmavridis.com/research/fbcompression/
// but 15 years ago in 256 color mode, so you compute 2 channels per pixel instead of 3,
// and those channels are Y and one of Co or Cg; then you alternate Co/Cg in a checkerboard
// and pack 4 bits y, 3 bits chroma, 1 bit checkerboard switch.

## gist:2660905
// symmetric matrix stuff
// not heavily tested but should be about right
// hlsl notation
// thanks to @rygorous @m1k3 and mark adami for help
// mostly just a big geek out :)

// multiply e by the 3x3 symmetric 3x3 matrix fromed from d on the diagonal and u in the upper triangle
float3 mul_sym3x3(float3 d, float3 u, float3 e)
{
	return float3(dot(e,float3(d.x,u.z,u.y)), // u=(yz,xz,xy) in the case of covariance matrix

## gist:1219860
#include <stdio.h>
#include <arpa/inet.h> // for htonl
int main(int argc, char **argv){
	unsigned int c=0,d,i,block[256]={0};
	fread(block,1,1024,fopen(argv[1],"rb"));
	for (i=0;i<256;++i)
		if (c > (d=c+htonl(block[i]))) c=d+1; else c=d;
	block[1]=htonl(c)^0xffffffff;
	fwrite(block,1,1024,fopen(argv[2],"wb"));
	return 0;
	// by alex evans, 2011. released into the public domain.
	// based on a first ever reading of the png spec, it occurs to me that a minimal png encoder should be quite simple.
	// this is a first stab - may be buggy! the only external dependency is zlib and some basic typedefs (u32, u8)
	//
	// VERSION 0.02! now using zlib's crc rather than my own, and avoiding a memcpy and memory scribbler in the old one
	// by passing the zero byte at the start of the scanline to zlib first, then the original scanline in place. WIN!
	//
	// more context at http://altdevblogaday.org/2011/04/06/a-smaller-jpg-encoder/.
	//
	// follow me on twitter @mmalex http://twitter.com/mmalex
	#include <stdio.h>
	#include <stdint.h>
	#include <stdlib.h>
	#include <assert.h>
	#include <vector>
	// arithmetic coder/decoder by @mmalex based on the wikipedia page https://en.wikipedia.org/wiki/Range_coding as of July 2022
	struct encdec_state {
	typedef uint16_t artype;
	const static int topshift = sizeof(artype)*8-8;
	artype low=0,range=~artype(0);
	bbo notes
	--
	this is literally just the first time I've got the module compiling, so it's not any way 'done' its just 'a step closer', but at least I can send you a hex file now.
	theres gonna be teething trouble coz Ive not tried a lot of this before!
	but heres what I think:

	FLASHING

	to program it, you can use stm32cubeprogrammer, (or the ide), that you can get from https://www.st.com/en/development-tools/stm32cubeprog.html
	you can either use DFU mode where you disconnect it from euro power, and plug a USB A<->USB A cable in to it. as it powers up, you may need to short JP1 which forces it into ST's DFU bootrom, windows should detect the USB device and stm32cube programmer should let you flash it via DFU mode. (I havent tried this yet, but will).
	#include "stdafx.h"
	#include <Windows.h> // for performance counter
	// quick test of the theory in https://gist.github.com/mmalex/3a538aaba60f0ca21eac868269525452
	// we try running a simple impulse train (click every 4096 samples) through 6 allpasses with random lengths
	// we time how long it takes to process 1 million samples, structuring the loop 3 ways:
	// - a sample at a time with self contained allpass structures,
	// - a sample at a time with a single big buffer
	// - a block at a time using self contained allpass structures, operating in place on a 256 sample buffer.
	// using this naive code, on a single core of my AMD threadripper, with default release compile settings on visual studio 2015,
	// I see
	/*
	quick midi 2 osc hack by @mmalex, use at your own risk etc
	reads midi messages, forward them to osc messages on your network.

	barely tested but works for me in windows 7, compiled with msvc 2015 sp2

	stick a midi2osc.ini file in the same folder as the exe to configure it, with lines like these:

	IP=43.193.207.78
	Port=9000
	@mmalex:
	the redis feedback thread has been interesting, and I share many of antirez's 'tastes' when it comes to wanting to create pragmatic software that is 'robust enough' for many use cases. OTOH I do think that redis is in an interesting place regarding how its users view (and potentially misinterpret) its data 'promises' , especially in cluster

	I just wanted to give an example in case it is a useful datapoint in the spectrum being discussed for redis (pure memory store vs evolving into a disk cluster that might or might not make CP claims...)

	TL;DR difference from redis cluster: I basically wanted not to have the distinction of master vs slave, and all the complexity that brings (having to do elections, split brain, etc) and instead build everything around merging between symmetrical peers - ie every node is equal, and they merge their differences on (re)connection. I think it's a really powerful paradigm, and not one that I've seen antirez talk much about with respect to redis. if you're interested in
	// sillycocg.cpp : disgusting test code. loads a 320x180 rgb image from input.raw
	// writes out a (fixed) photoshop-style 3 bytes per index palette file to ycocg.act
	// you can load this in photoshop via image/mode/index then image/mode/color-table.../load...
	// then it writes out an 8bpp indexed image to output.raw
	// its kinda a hindsight is 2020 thing, the old 'fake truecolor' demos that did RGB stripes/
	// patterns - I wondered what it would be like to do something like
	// http://www.pmavridis.com/research/fbcompression/
	// but 15 years ago in 256 color mode, so you compute 2 channels per pixel instead of 3,
	// and those channels are Y and one of Co or Cg; then you alternate Co/Cg in a checkerboard
	// and pack 4 bits y, 3 bits chroma, 1 bit checkerboard switch.
	// symmetric matrix stuff
	// not heavily tested but should be about right
	// hlsl notation
	// thanks to @rygorous @m1k3 and mark adami for help
	// mostly just a big geek out :)

	// multiply e by the 3x3 symmetric 3x3 matrix fromed from d on the diagonal and u in the upper triangle
	float3 mul_sym3x3(float3 d, float3 u, float3 e)
	{
	return float3(dot(e,float3(d.x,u.z,u.y)), // u=(yz,xz,xy) in the case of covariance matrix
	#include <stdio.h>
	#include <arpa/inet.h> // for htonl
	int main(int argc, char **argv){
	unsigned int c=0,d,i,block[256]={0};
	fread(block,1,1024,fopen(argv[1],"rb"));
	for (i=0;i<256;++i)
	if (c > (d=c+htonl(block[i]))) c=d+1; else c=d;
	block[1]=htonl(c)^0xffffffff;
	fwrite(block,1,1024,fopen(argv[2],"wb"));
	return 0;