This document is a spec for a set of new UGens for SuperCollider that will allow for arbitrary sub-sample indexing into audio buffers up to 2139095040 samples long (12.3 hours at 48k).
Work-in-progress implementation of this spec can be found here.
BufRd currently accepts a 32-bit float as an index into a buffer. This is due to limitations of SuperCollider’s server architecture. This means, using BufRd to play an audio buffer at a playback rate of 0.3, I start noticing major artifacts around 2**20
samples in (20 seconds at 48k):
~buffer = Buffer.read(s, "path", 0, 2**24);
{ BufRd.ar(2, ~buffer, Phasor.ar(0, 0.3, 2**20, 2**24)) }.play
{ BufRd.ar(2, ~buffer, Phasor.ar(0, 0.3, 2**21, 2**24)) }.play
{ BufRd.ar(2, ~buffer, Phasor.ar(0, 0.3, 2**22, 2**24)) }.play
{ BufRd.ar(2, ~buffer, Phasor.ar(0, 0.3, 2**23, 2**24)) }.play
Because UGen ins and outs are 32-bit floats, the indexing accuracy progressively decreases the longer one plays a buffer using BufRd, resulting in progressively more audible artifacts over time.
PlayBuf and VDiskIn do not suffer from these limitations (i.e. you can play artifact-free at any rate from anywhere in the buffer within reason -- eventually after hours or days you’d also hit the limits of the double data type), but they also lack any means of finding out where exactly in the buffer you are currently playing, and because they take a starting position as a 32-bit float, you eventually become limited on how sample-accurately you can cue playback. (VDiskIn also doesn’t allow negative rates.)
To get around the 32-bit float server limit, we have to be able to communicate with UGens using multiple float values that internally combine to make a double-precision index.
The way I have opted to do this is to use a 32-bit int recast as a float (i.e. keeps the same bit pattern but calls itself a float rather than an int — this results in a nonsensical/arbitrary float value), which the server can pass around in blissful ignorance that it isn’t really a float. The UGens internally convert this back to an int32 value and add to it another float which contains the decimal portion of the desired number. (i.e. 45.3992 would be constructed from integer 45 and decimal 0.3992.)
This theoretically gives us a range of sample access from 0 to the maximum possible int32; however if we want to communicate these values with sclang, we are limited by sclang’s interpretation of high floating-point values. In my testing, I have found sclang can correctly interpret every integer in the range [0, 2139095040] recast as float (after which you start getting 'nan's which can't be retrieved), which is still enough for more than 12 hours of audio at 48k. So this is our working range of sample access.
At the maximum of this range, the double type supports four decimal places of precision, so we can guarantee precision down to at least 1/1000th of a sample across this entire range. For even better accuracy the BufRd algorithm could be adjusted; currently internally phase is passed around as a double, but it could probably be kept as a separate integer index and float fractional value, since this is ultimately what it is turned back into.
Internally the algorithm the UGens use to convert between input/output int/float format and double format is as follows:
// to double
double numIntAsDouble = (double)(*reinterpret_cast<int32*>(&numIntAsFloat));
double numDecAsDouble = (double)numDec;
double num = numIntAsDouble + numDecAsDouble;
// from double
int32 numInt = (int32)num;
float numIntAsFloat = *reinterpret_cast<float*>(&numInt);
float numDec = num - numInt;
Thus, the UGens take input and send output in the form of [numIntAsFloat, numDec] pairs. To convert between these pairs on the sclang side, the following algorithm may be used:
// to double
var num = numIntAsFloat.as32Bits + numDec;
// from double
var numIntAsFloat = Float.from32Bits(floor(num));
var numDec = num - floor(num);
Note that the .from32Bits and .as32Bits primitives use unions rather than reinterpret_cast. If this causes discrepancies on certain machines we might need to fix the implementation in either the UGens or sclang. It is my impression that the behavior of reinterpret_cast is more consistent and well-defined than union.
This is a good time to consider usability as well as some other feature requests for buffer playback in general.
- How can we make the [numIntAsFloat, numDec] format as painless for users as possible? Ideally someone would never actually need to write
x.set(\playheadInt, Float.from32Bits(floor(startFrame)), \playheadDec, startFrame - floor(startFrame), \trig, 1)
to cue a playback position, orvar curFrame = playheadInt.as32Bits + playheadDec
to know where the playhead is at. - How can we provide options for users not to have to know about sample rates or really samples at all when interfacing with buffers? i.e. never need to type BufRateScale, BufFrames, etc.
- It would be nice to know whether or not playback has yet stopped at the beginning/end of the file, and relatedly nice to have the UGen set a Done flag at this time. These two are not quite the same, because playback might yet resume when/if the rate changes sign, or a different section of the buffer is cued, or looping is turned on.
- How can we provide a simple interface to loop a buffer or portion of a buffer with a crossfade? I think ideally this interface would also crossfade when a new play position is cued.
SuperBufRd.ar(numChannels:1, bufnum:0, phaseIntAsFloat:0, phaseDec:0, loop:1, interpolation:2)
A modification of BufRd to be able to access samples with double precision.
- numChannels: The number of channels the buffer will be (ir)
- bufnum: The index of the buffer to use (kr)
- phaseIntAsFloat: The integer portion of the index into the buffer, recast to float (ar)
- phaseDec: The fractional portion of the index into the buffer (ar)
- loop: Whether to loop at the end of the buffer (kr)
- interpolation: 1 is no interpolation, 2 is linear, 4 is cubic (ir)
# phaseIntAsFloat, phaseDec, isPlaying = SuperPhasor.ar(
trig:0, rate:1, startIntAsFloat:0, startDec:0, endIntAsFloat:(Float.from32Bits(1)), endDec:0,
resetIntAsFloat:0, resetDec:0, loop:1)
# phaseIntAsFloat0, phaseDec0, phaseIntAsFloat1, phaseDec1, pan0,
phaseIntAsFloat2, phaseDec2, phaseIntAsFloat3, phaseDec3, pan1,
pan2, isPlaying = SuperPhasorX.ar(
trig:0, rate:1, startIntAsFloat:0, startDec:0, endIntAsFloat:(Float.from32Bits(1)), endDec:0,
resetIntAsFloat:0, resetDec:0, loop:1, overlap:5)
Phasor UGens to interface with SuperBufRd for playback of long sound buffers. SuperPhasor can drive a single SuperBufRd like so:
SuperPhasorX will output phases and pan for four SuperBufRd’s with three XFade2s, to implement smooth crossfading on looping and seeking, to be connected like so:
The four SuperBufRds are necessary for the edge case in which a user is currently looping playback, is currently within the bounds of the crossfade at the beginning / end of the loop (requires two SuperBufRds), and would like to jump to a different playback position that is also within the bounds of the crossfade (requires another two SuperBufRds). We assume the crossfade is short enough that a user will not attempt to jump to a new position while already in the middle of a jump crossfade. This will produce a click. Otherwise this should produce click-free playback for all looping and jumping.
For the argument list, these keep the basic idea of Phasor’s (trig, rate, start, end, resetPos) ordering, with the functional difference that the playhead starts at resetPos and not start (because one might want to start playback in the middle of a loop or bounded segment).
All ‘intAsFloat’ values take an integer in the range [0, 2139095040] recast to a float.
- trig: On a trigger, jump to resetPos (ar or kr)
- rate: The amount of change per sample, can be positive or negative (ar or kr)
- startIntAsFloat, startDec: Start of the loop / playback range (kr)
- endIntAsFloat, endDec: End of the loop / playback range (kr)
- resetIntAsFloat, resetDec: Where in the range to start, and where to jump to on receiving a trigger (kr)
- loop: whether to loop at the ends of the range or not (kr)
- overlap: number of samples to overlap/crossfade at the beginning and end, will be clipped to max out at half the playback range (kr)
This could be a sclang wrapper around the [intAsFloat, dec] pairings. A helper method could be added to Buffer so a user wouldn’t need to know about second-to-sample conversion.
SuperIndex(sampleNum, sampleRate:(Server.default.sampleRate))
SuperIndex.fromSecs(secs, sampleRate:(Server.default.sampleRate))
buf.atSec(secs)
Usage:
buf = Buffer.read(s, “path”);
index = buf.atSec(30);
These indexes can be used directly as inputs to the SuperPlayBuf family below.
sig = SuperPlayBuf.ar(numChannels:1, bufnum:0, rate:1,
startPos:[0, 0], endPos:nil, cuePos:[0, 0], cueTrig:0,
loop:0, interpolation:2)
# sig, playhead, isPlaying = SuperPlayBufDetails.ar(numChannels:1, bufnum:0, rate:1,
startPos:[0, 0], endPos:nil, cuePos:[0, 0], cueTrig:0,
loop:0, interpolation:2)
Pseudo-UGen wrapper around SuperPhasor / SuperBufRd as described above. Functionally very similar to PlayBuf, with the added benefits that you can set a beginning and end of playback range, and with the SuperPlayBufDetails variety you can know exactly where you are in playback at a given moment.
- numChannels: The number of channels the buffer will be (ir)
- bufnum: The index of the buffer to use (kr)
- rate: 1 plays at the buffer’s normal speed, 0.5 half speed, 2 double, etc. (kr or ar)
- startPos: Where to start loop/section (an index pair) (kr)
- endPos: Where to end loop/section (if <= startPos this will be end of the buffer) (an index pair) (kr)
- cuePos: Where to start playback / jump on a cueTrig (if < startPos this will be startPos) (an index pair) (kr)
- cueTrig: Jump to cuePos (kr or ar)
- loop: Whether to loop or stop playback at the start/end (kr)
- interpolation: 1 is no interpolation, 2 is linear, 4 is cubic (ir)
sig = SuperPlayBufX.ar(numChannels:1, bufnum:0, rate:1,
startPos:[0, 0], endPos:nil, cuePos:[0, 0], cueTrig:0,
loop:0, interpolation:2, fadeTime:0.01)
# sig, playhead, isPlaying = SuperPlayBufXDetails.ar(numChannels:1, bufnum:0, rate:1,
startPos:[0, 0], endPos:nil, cuePos:[0, 0], cueTrig:0,
loop:0, interpolation:2, fadeTime:0.01)
Same as SuperPlayBuf but using SuperPhasorX to drive four SuperBufRds and three XFade2s as described above. This allows for crossfades at the beginning/end of loop segments as well as upon jumping to a new playback position using cuePos / cueTrig.
- numChannels: The number of channels the buffer will be (ir)
- bufnum: The index of the buffer to use (kr)
- rate: 1 plays at the buffer’s normal speed, 0.5 half speed, 2 double, etc. (kr or ar)
- startPos: Where to start loop/section (an index pair) (kr)
- endPos: Where to end loop/section (if <= startPos this will be end of the buffer) (an index pair) (kr)
- cuePos: Where to start playback / jump on a cueTrig (if < startPos this will be startPos) (an index pair) (kr)
- cueTrig: Jump to cuePos (kr or ar)
- loop: Whether to loop or stop playback at the start/end (kr)
- interpolation: 1 is no interpolation, 2 is linear, 4 is cubic (ir)
- fadeTime: Amount of crossfade at beginning/end (kr)
(
s.waitForBoot {
~buf = Buffer.read(s, "path/to/long/soundfile.wav");
s.sync;
~synth = {
var cuePos = \cuePos.kr([0, 0]);
var cueTrig = \cueTrig.tr(0);
SuperPlayBufX.ar(2, ~buf, 1, ~buf.atSec(2.2), ~buf.atSec(20.1), cuePos, cueTrig);
}.play;
}
)
~synth.set(\cuePos, ~buf.atSec(14.023), \cueTrig, 1)
again, really nice work!
SuperPlayBuf.ar
andSuperPlayBufDetails.ar
don't need to be separate classes -- i'd suggestSuperPlayBuf.arDetails
, and similarly forSuperPlayBufX
.i would recommend using the more general argument name "quality" instead of "interpolation", which gives us the opportunity to expand these to support oversampling.
if i'm not mistaken, you could use an unsigned int32 instead of a signed one, that doubles the possible buffer length to ~25 hours at 48k. worst case, the user is working with a high sample rate like 96k (max 12 hours) or 192k (max 6 hours). the real limitation is RAM, since regardless of SR a single channel of that length will occupy 17 gigs, and that's more than the vast majority of consumer hardware these days.
however, the engineer in me suggests that we could theoretically squeeze even more out of this: meld both of the channels together into a single fixed-point unsigned 64-bit integer, using the first 48 for the sample index and the last 16 for the subsample offset. this gives us ludicrous quality specs: 185 years of 48k audio (lol), and 65536× oversampled read pointer. is this hypothetical improvement worth disrupting the elegance of "reinterpreted 32-bit int" + "actual 32-bit float"? probably not. but scaffolding for a 64-bit int format could be potentially reusable for other ugens, so let's not discount it.
one thing i'm not quite wild about is the way the phasor values are passed around in separately. this is a good opportunity to abstract the double-barrel value in the class library so the user passes it around as if it's a single value. just vague ideas now, i'd have to think about the best exact implementation.