Skip to content

Instantly share code, notes, and snippets.

@suhr
Last active August 16, 2021 09:46
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save suhr/f8fe1b0cdbde8804c9c08b4c81bdbd81 to your computer and use it in GitHub Desktop.
Save suhr/f8fe1b0cdbde8804c9c08b4c81bdbd81 to your computer and use it in GitHub Desktop.
# SSE to BQN dictonary
# This is an implementation of SSE SIMD intristics in BQN. The purpose of it
# is understanding: it's a look at SIMD instructions from an array programmer POV.
# There are limit of what you can can represent in a programming language without
# losing clarity. Bit hackery is one of such features. So instead of converting
# from floats to bits and back, we just assume that the data *already* is
# represented the way we need.
# As they say, all models are wrong but some are useful.
# For the reference for all kinds of SIMD for x86_64, see the Intel® C++ Compiler Classic
# Developer Guide and Reference
# For a quick reference, see the Intel® Intrinsics Guide
# The actual value of this is 0xFFFFFFFF, but we model it as NaN.
mf ← 0÷0
# Two floating-point numbers are called *ordered* if neither of them is NaN
Ord ← ∧´·=˜∾
# Round to integer (ties even) and round to zero (truncate)
Rti ← ⌊⊢+2÷˜0.5≠2|⊢ ⋄ Trn ← ××⌊∘|
## Arithmetics
Add_ps ← + ⋄ Sub_ps ← - ⋄ Mul_ps ← × ⋄ Div_ps ← ÷
Sqrt_ps ← √ ⋄ Rcp_ps ← ÷ ⋄ Rsqrt_ps ← ÷∘√ ⋄
Min_ps ← ⌊ ⋄ Max_ps ← ⌈ ⋄
Add_ss ← +⌾⊏˜ ⋄ Sub_ss ← -˜⌾⊏˜ ⋄ Mul_ss ← ×⌾⊏˜ ⋄ Div_ss ← ÷˜⌾⊏˜
Sqrt_ss ← √⌾⊏ ⋄ Rcp_ss ← ÷⌾⊏ ⋄ Rsqrt_ss ← ÷∘√⌾⊏ ⋄
Rin_ss ← ⌊⌾⊏˜ ⋄ Rax_ss ← ⌈⌾⊏˜ ⋄
# Most of other _ss intristics follow the same `F⌾⊏` pattern, so we will skip them
## Logic
# Here values are 4‿32 arrays of bits
And_ps ← ∧ ⋄ Andnot_ps ← ¬∘∧
Or_ps ← ∨ ⋄ Xor_ps ← ≠
## Comparsions
Cmpeq_ps ← 0‿mf⊏˜= ⋄ Cmple_ps ← 0‿mf⊏˜≤ ⋄ Cmpge_ps ← 0‿mf⊏˜≥
Cmpneq_ps ← 0‿mf⊏˜≠ ⋄ Cmplt_ps ← 0‿mf⊏˜< ⋄ Cmpgt_ps ← 0‿mf⊏˜>
Cmpnlt_ps ← 0‿mf⊏˜¬∘< ⋄ Cmpnle_ps ← 0‿mf⊏˜¬∘≤
Cmpngt_ps ← 0‿mf⊏˜¬∘> ⋄ Cmpnge_ps ← 0‿mf⊏˜¬∘≥
Cmpord_ps ← 0‿mf⊏˜Ord¨ ⋄ Cmpunord_ps ← 0‿mf⊏˜¬∘Ord¨
# This ones return an i32
Comieq_ss ← =○⊑ ⋄ Comilt_ss ← <○⊑ ⋄ Comigt_ss ← >○⊑
Comineq_ss ← ≠○⊑ ⋄ Comile_ss ← ≤○⊑ ⋄ Comige_ss ← ≥○⊑
# ucomi* instructions are the same as comi*, except they do not signal
# an exception for QNaNs
## Conversions
# Conversions are most poorly modeled, because BQN has only a single number type
# This return single or packed i32
Cvtss_si32 ← Rti ⊑ ⋄ Cvtps_pi32 ← Rti 2⊸↑
Cvttss_si32 ← Trn ⊑ ⋄ Cvttps_pi32 ← Trn 2⊸↑
Cvtps_pi16 ← Trn ⋄ Cvtps_pi8 ← Trn
# These convert integers to floats
Cvtsi32_ss ← ⊣⌾⊑˜ ⋄ Cvtpi32_ps ← ⊣⌾(2⊸↑)˜ ⋄
Cvtpi16_ps ← ⊢ ⋄ Cvtpu16_ps ← ⊢ ⋄
Cvtpi8_ps ← ⊢ ⋄ Cvtpu8_ps ← ⊢ ⋄ Cvtpi32x2_ps ← ⊢∾
# This one returns an f32
Cvtss_f32 ← ⊑
## Load intristics
# Load intristics have a pointer as an argument, but we assume it's an array or a value
Loadh_pi ← (2↑⊣)∾⊢ ⋄ Loadl_pi ← ⊢∾(2↓⊣) ⋄
Load_ss ← 4↑⊢ ⋄ Load1_ps ← 4⥊⊢ ⋄
Load_ps ← ⊢ ⋄ Loadu_ps ← ⊢ ⋄ Loadr_ps ← ⌽
## Set intristics
# Set intristics are like load intristics, except they take values instead of a pointer
Set_ss ← 4↑⊢ ⋄ Set1_ps ← 4⥊⊢ ⋄
Set_ps ← ⊢ ⋄ Setr_ps ← ⌽ ⋄ setzero_ps ← 4⥊0
## Store inristics
# Store intristcs are essentially an inverse of load intristics
Storeh_pi ← 2⊸↓ ⋄ Storel_pi ← 2⊸↑ ⋄
Store_ss ← ⊑ ⋄ Store1_ps ← 4⥊⊑ ⋄
Store_ps ← ⊢ ⋄ Storeu_ps ← ⊢ ⋄ Storer_ps ← ⌽
## Integer Intrinsics
# Integer intristics see _m64 as an array of integers
# Integer out, integer in
Extract_pi16 ← ⊑˜ ⋄ Insert_pi16 ← {d‿n←𝕩 ⋄ d⌾(n⊑⊢)𝕨}
# ⌈ and ⌊
Max_pi16 ← ⌈ ⋄ Max_pu8 ← ⌈ ⋄ Min_pi16 ← ⌊ ⋄ Min_pu8 ← ⌊
# Interpret the input as 8‿i8. Result is an 8‿u1 bitmask
Movemask_pi8 ← <⟜0
# Take the higher halfs of the result
Mulhi_pu16 ← (2⋆16) ⌊∘÷˜ ×
# Indices are 4‿u2 as int
Shuffle_pi16 ← ⊏˜
# The intristic actually writes to d
Maskmove_si64 ← {m‿d←𝕩 ⋄ (m/𝕨)⌾(m/⊢)d}
# Averages
Avg_pu8 ← 2 ⌊∘÷˜ + ⋄ Avg_pu16 ← 2 ⌊∘÷˜ +
# Input is 8‿u8, output is 4‿i32
Sad_pu8 ← 4↑|∘-´ # pu8 is sad :(
## Miscellaneous
# Indices are 4‿u2 as uint
Shuffle_ps ← {a‿b←𝕨 ⋄ (a⊏˜2↑𝕩)∾b⊏˜2↓𝕩}
Unpackhi_ps ← ⥊∘⍉ ≍○(2↓⊢) ⋄ Unpacklo_ps ← ⥊∘⍉ ≍○(2↑⊢)
Move_ss ← ⊑⌾⊑˜ ⋄ Movehl_ps ← (2↓⊣)⌾(2↑⊢)˜
Movelh_ps ← (2↑⊣)⌾(2↓⊢)˜ ⋄ Movemask_ps ← <⟜0
# `Undefined_ps` is, well, undefined
# That's all of the first SSE. But there are also SSE2, SSE3, SSSE3 and even SSE4...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment