Last active
August 16, 2021 09:46
-
-
Save suhr/f8fe1b0cdbde8804c9c08b4c81bdbd81 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# SSE to BQN dictonary | |
# This is an implementation of SSE SIMD intristics in BQN. The purpose of it | |
# is understanding: it's a look at SIMD instructions from an array programmer POV. | |
# There are limit of what you can can represent in a programming language without | |
# losing clarity. Bit hackery is one of such features. So instead of converting | |
# from floats to bits and back, we just assume that the data *already* is | |
# represented the way we need. | |
# As they say, all models are wrong but some are useful. | |
# For the reference for all kinds of SIMD for x86_64, see the Intel® C++ Compiler Classic | |
# Developer Guide and Reference | |
# For a quick reference, see the Intel® Intrinsics Guide | |
# The actual value of this is 0xFFFFFFFF, but we model it as NaN. | |
mf ← 0÷0 | |
# Two floating-point numbers are called *ordered* if neither of them is NaN | |
Ord ← ∧´·=˜∾ | |
# Round to integer (ties even) and round to zero (truncate) | |
Rti ← ⌊⊢+2÷˜0.5≠2|⊢ ⋄ Trn ← ××⌊∘| | |
## Arithmetics | |
Add_ps ← + ⋄ Sub_ps ← - ⋄ Mul_ps ← × ⋄ Div_ps ← ÷ | |
Sqrt_ps ← √ ⋄ Rcp_ps ← ÷ ⋄ Rsqrt_ps ← ÷∘√ ⋄ | |
Min_ps ← ⌊ ⋄ Max_ps ← ⌈ ⋄ | |
Add_ss ← +⌾⊏˜ ⋄ Sub_ss ← -˜⌾⊏˜ ⋄ Mul_ss ← ×⌾⊏˜ ⋄ Div_ss ← ÷˜⌾⊏˜ | |
Sqrt_ss ← √⌾⊏ ⋄ Rcp_ss ← ÷⌾⊏ ⋄ Rsqrt_ss ← ÷∘√⌾⊏ ⋄ | |
Rin_ss ← ⌊⌾⊏˜ ⋄ Rax_ss ← ⌈⌾⊏˜ ⋄ | |
# Most of other _ss intristics follow the same `F⌾⊏` pattern, so we will skip them | |
## Logic | |
# Here values are 4‿32 arrays of bits | |
And_ps ← ∧ ⋄ Andnot_ps ← ¬∘∧ | |
Or_ps ← ∨ ⋄ Xor_ps ← ≠ | |
## Comparsions | |
Cmpeq_ps ← 0‿mf⊏˜= ⋄ Cmple_ps ← 0‿mf⊏˜≤ ⋄ Cmpge_ps ← 0‿mf⊏˜≥ | |
Cmpneq_ps ← 0‿mf⊏˜≠ ⋄ Cmplt_ps ← 0‿mf⊏˜< ⋄ Cmpgt_ps ← 0‿mf⊏˜> | |
Cmpnlt_ps ← 0‿mf⊏˜¬∘< ⋄ Cmpnle_ps ← 0‿mf⊏˜¬∘≤ | |
Cmpngt_ps ← 0‿mf⊏˜¬∘> ⋄ Cmpnge_ps ← 0‿mf⊏˜¬∘≥ | |
Cmpord_ps ← 0‿mf⊏˜Ord¨ ⋄ Cmpunord_ps ← 0‿mf⊏˜¬∘Ord¨ | |
# This ones return an i32 | |
Comieq_ss ← =○⊑ ⋄ Comilt_ss ← <○⊑ ⋄ Comigt_ss ← >○⊑ | |
Comineq_ss ← ≠○⊑ ⋄ Comile_ss ← ≤○⊑ ⋄ Comige_ss ← ≥○⊑ | |
# ucomi* instructions are the same as comi*, except they do not signal | |
# an exception for QNaNs | |
## Conversions | |
# Conversions are most poorly modeled, because BQN has only a single number type | |
# This return single or packed i32 | |
Cvtss_si32 ← Rti ⊑ ⋄ Cvtps_pi32 ← Rti 2⊸↑ | |
Cvttss_si32 ← Trn ⊑ ⋄ Cvttps_pi32 ← Trn 2⊸↑ | |
Cvtps_pi16 ← Trn ⋄ Cvtps_pi8 ← Trn | |
# These convert integers to floats | |
Cvtsi32_ss ← ⊣⌾⊑˜ ⋄ Cvtpi32_ps ← ⊣⌾(2⊸↑)˜ ⋄ | |
Cvtpi16_ps ← ⊢ ⋄ Cvtpu16_ps ← ⊢ ⋄ | |
Cvtpi8_ps ← ⊢ ⋄ Cvtpu8_ps ← ⊢ ⋄ Cvtpi32x2_ps ← ⊢∾ | |
# This one returns an f32 | |
Cvtss_f32 ← ⊑ | |
## Load intristics | |
# Load intristics have a pointer as an argument, but we assume it's an array or a value | |
Loadh_pi ← (2↑⊣)∾⊢ ⋄ Loadl_pi ← ⊢∾(2↓⊣) ⋄ | |
Load_ss ← 4↑⊢ ⋄ Load1_ps ← 4⥊⊢ ⋄ | |
Load_ps ← ⊢ ⋄ Loadu_ps ← ⊢ ⋄ Loadr_ps ← ⌽ | |
## Set intristics | |
# Set intristics are like load intristics, except they take values instead of a pointer | |
Set_ss ← 4↑⊢ ⋄ Set1_ps ← 4⥊⊢ ⋄ | |
Set_ps ← ⊢ ⋄ Setr_ps ← ⌽ ⋄ setzero_ps ← 4⥊0 | |
## Store inristics | |
# Store intristcs are essentially an inverse of load intristics | |
Storeh_pi ← 2⊸↓ ⋄ Storel_pi ← 2⊸↑ ⋄ | |
Store_ss ← ⊑ ⋄ Store1_ps ← 4⥊⊑ ⋄ | |
Store_ps ← ⊢ ⋄ Storeu_ps ← ⊢ ⋄ Storer_ps ← ⌽ | |
## Integer Intrinsics | |
# Integer intristics see _m64 as an array of integers | |
# Integer out, integer in | |
Extract_pi16 ← ⊑˜ ⋄ Insert_pi16 ← {d‿n←𝕩 ⋄ d⌾(n⊑⊢)𝕨} | |
# ⌈ and ⌊ | |
Max_pi16 ← ⌈ ⋄ Max_pu8 ← ⌈ ⋄ Min_pi16 ← ⌊ ⋄ Min_pu8 ← ⌊ | |
# Interpret the input as 8‿i8. Result is an 8‿u1 bitmask | |
Movemask_pi8 ← <⟜0 | |
# Take the higher halfs of the result | |
Mulhi_pu16 ← (2⋆16) ⌊∘÷˜ × | |
# Indices are 4‿u2 as int | |
Shuffle_pi16 ← ⊏˜ | |
# The intristic actually writes to d | |
Maskmove_si64 ← {m‿d←𝕩 ⋄ (m/𝕨)⌾(m/⊢)d} | |
# Averages | |
Avg_pu8 ← 2 ⌊∘÷˜ + ⋄ Avg_pu16 ← 2 ⌊∘÷˜ + | |
# Input is 8‿u8, output is 4‿i32 | |
Sad_pu8 ← 4↑|∘-´ # pu8 is sad :( | |
## Miscellaneous | |
# Indices are 4‿u2 as uint | |
Shuffle_ps ← {a‿b←𝕨 ⋄ (a⊏˜2↑𝕩)∾b⊏˜2↓𝕩} | |
Unpackhi_ps ← ⥊∘⍉ ≍○(2↓⊢) ⋄ Unpacklo_ps ← ⥊∘⍉ ≍○(2↑⊢) | |
Move_ss ← ⊑⌾⊑˜ ⋄ Movehl_ps ← (2↓⊣)⌾(2↑⊢)˜ | |
Movelh_ps ← (2↑⊣)⌾(2↓⊢)˜ ⋄ Movemask_ps ← <⟜0 | |
# `Undefined_ps` is, well, undefined | |
# That's all of the first SSE. But there are also SSE2, SSE3, SSSE3 and even SSE4... |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment