Distortion is 95 dB THD or so, has fewer frequency components than linear-interpolated LUT, but overall THD level is similar.
Can only go up to fs/6, though, because signed a can only go to Q31 1.0, not 2.0.
Found in http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.84.1650 but didn't work until I combined with http://www.musicdsp.org/showArchiveComment.php?ArchiveID=10
This is even faster, but distorted at low levels: https://gist.github.com/endolith/14bbb3217f9f58248722
Crude timing results (Should have measured with many instances in parallel instead):
- Muted: 0.57%
- arm_sin_q31: 5.77%
- AudioSynthWaveformSine: 1.85%
- Resonant sine: 1.67%
- Resonant with SMMLAR: 1.63%
- Resonant with SMMLAR/SMMLSR: 1.58% (this)
- Resonant with SMMLAR/SMMLSR and
__attribute__((optimize("unroll-loops")))
: 1.53% - Resonant with SMMLAR/SMMLSR manually unrolled 8x: 1.41% (this)
Then compared compilers and magnitude-scaled version:
- Resonant SMMLSR 3 mult in Arduino 26,836 bytes: 1.58% (this)
- Resonant 2 mult in Arduino 28,340 bytes: 1.4%
- Resonant SMMLSR 3 mult in UECIDE 21,924 bytes: 1.37%
- Resonant 2 mult in UECIDE 23,432 bytes: 1.24%
- Resonant 2 mult in UECIDE but unrolled 8x: 23,504 bytes, 0.94%
- Quadratic sine first attempt: 1.98%
So best case is (1.85-0.57)/(0.94-0.57) = 2 to 3.4x improvement over linear-interpolated LUT? But doesn't have same frequency range.