Skip to content

Instantly share code, notes, and snippets.

@unknownbrackets
Created May 7, 2014 07:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save unknownbrackets/0e5332587eb623a92ccb to your computer and use it in GitHub Desktop.
Save unknownbrackets/0e5332587eb623a92ccb to your computer and use it in GitHub Desktop.
bgra/rgba interleaved loads
// Probably too many registers?
for (u32 i = 0; i < sseChunks; i += 2) {
__m128i c = _mm_load_si128(&srcp[i + 0]);
__m128i c2 = _mm_load_si128(&srcp[i + 1]);
__m128i rb = _mm_andnot_si128(maskGA, c);
c = _mm_and_si128(c, maskGA);
__m128i rb2 = _mm_andnot_si128(maskGA, c2);
c2 = _mm_and_si128(c2, maskGA);
__m128i b = _mm_srli_epi32(rb, 16);
__m128i r = _mm_slli_epi32(rb, 16);
c = _mm_or_si128(_mm_or_si128(c, r), b);
__m128i b2 = _mm_srli_epi32(rb, 16);
__m128i r2 = _mm_slli_epi32(rb, 16);
c2 = _mm_or_si128(_mm_or_si128(c2, r2), b2);
_mm_store_si128(&dstp[i + 0], c);
_mm_store_si128(&dstp[i + 1], c2);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment