Skip to content

Instantly share code, notes, and snippets.

@ShaderManager
Last active July 22, 2020 12:14
Show Gist options
  • Save ShaderManager/f68e7bd0c55017c6d2ab to your computer and use it in GitHub Desktop.
Save ShaderManager/f68e7bd0c55017c6d2ab to your computer and use it in GitHub Desktop.
/*
Sort 4 floats in SSE vector using sorting network and return indices of moved values
*/
inline __m128i v4_sort(__m128& v)
{
auto i = _mm_castsi128_ps(_mm_set_epi32(3, 2, 1, 0));
const auto mask = _mm_castsi128_ps(_mm_set1_epi32(0xFFFFFFFC));
// Place indices in lower 2 bits of mantissa
v = _mm_or_ps(_mm_and_ps(v, mask), i);
// Simple sorting network for n=4
// First pass
auto temp = _mm_shuffle_ps(v, v, _MM_SHUFFLE(1, 0, 3, 2));
auto cmp = _mm_cmplt_ps(v, temp);
cmp = _mm_shuffle_ps(cmp, cmp, _MM_SHUFFLE(1, 0, 1, 0));
auto temp2 = _mm_blendv_ps(temp, v, cmp);
// Second pass
temp = _mm_shuffle_ps(temp2, temp2, _MM_SHUFFLE(2, 3, 0, 1));
cmp = _mm_cmplt_ps(temp2, temp);
cmp = _mm_shuffle_ps(cmp, cmp, _MM_SHUFFLE(2, 0, 2, 0));
temp2 = _mm_blendv_ps(temp, temp2, cmp);
// Third pass
temp = _mm_shuffle_ps(temp2, temp2, _MM_SHUFFLE(3, 1, 2, 0));
cmp = _mm_cmplt_ps(temp2, temp);
cmp = _mm_shuffle_ps(cmp, cmp, _MM_SHUFFLE(3, 1, 1, 0));
v = _mm_blendv_ps(temp, temp2, cmp);
// Remove indices from input and return them
auto ret = _mm_castps_si128(_mm_andnot_ps(mask, v));
return ret;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment