Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
A32 NEON 8x8 U16 transpose
; input in Q0..Q7
; swap antidiagonal elements within 2x2 blocks
vtrn.16 q0, q1
vtrn.16 q2, q3
vtrn.16 q4, q5
vtrn.16 q6, q7
; swap antidiagonal 2x2 blocks within 4x4 blocks
vtrn.32 q0, q2
vtrn.32 q1, q3
vtrn.32 q4, q6
vtrn.32 q5, q7
; swap antidiagonal 4x4 blocks within 8x8 matrix
vswp d1, d8
vswp d3, d10
vswp d5, d12
vswp d7, d14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment