Skip to content

Instantly share code, notes, and snippets.

@MathGeniusJodie
MathGeniusJodie / nibble-dot.rs
Created January 14, 2026 20:09
avx dot product of packed nibbles, the nibbles are u4 with 0 mapped to 8, this avoids 2 instructions in the loop, do not use this without evaluating product quantization, it is better
use std::arch::x86_64::*;
#[target_feature(enable = "avx2")]
pub unsafe fn dot_i4_256_nibbles_unrolled(a_ptr: *const u8, b_ptr: *const u8) -> i32 {
// 1. Constants
let mask = _mm256_set1_epi8(0x0F);
let sub_bias = _mm256_set1_epi8(0x08);
// 2. Accumulators
// acc_dot stores the main product sums (16 x i16)
struct ReqWrapper {
req: OperationRequest,
tx: std::sync::mpsc::Sender<Resp>,
}
pub struct DB {
send: std::sync::mpsc::Sender<ReqWrapper>,
}
impl DB {