Skip to content

Instantly share code, notes, and snippets.

@rygorous
Created March 24, 2019 06:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rygorous/9387cbe6f33708adf91ed95e413bddc0 to your computer and use it in GitHub Desktop.
Save rygorous/9387cbe6f33708adf91ed95e413bddc0 to your computer and use it in GitHub Desktop.
A32 NEON 8x8 U16 transpose
; input in Q0..Q7
; swap antidiagonal elements within 2x2 blocks
vtrn.16 q0, q1
vtrn.16 q2, q3
vtrn.16 q4, q5
vtrn.16 q6, q7
; swap antidiagonal 2x2 blocks within 4x4 blocks
vtrn.32 q0, q2
vtrn.32 q1, q3
vtrn.32 q4, q6
vtrn.32 q5, q7
; swap antidiagonal 4x4 blocks within 8x8 matrix
vswp d1, d8
vswp d3, d10
vswp d5, d12
vswp d7, d14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment