Distortion is 95 dB THD or so, has fewer frequency components than linear-interpolated LUT, but overall THD level is similar.
Can only go up to fs/6, though, because signed a can only go to Q31 1.0, not 2.0.
Found in http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.84.1650 but didn't work until I combined with http://www.musicdsp.org/showArchiveComment.php?ArchiveID=10
This is even faster, but distorted at low levels: https://gist.github.com/endolith/14bbb3217f9f58248722
Crude timing results (Should have measured with many instances in parallel instead):