Skip to content

Instantly share code, notes, and snippets.

View urmzd's full-sized avatar

Urmzd Mukhammadnaim urmzd

View GitHub Profile
@urmzd
urmzd / llmem_bench_main.rs
Created April 17, 2026 00:13
LongMemEval retrieval comparison: MemPalace's 96.6% is ChromaDB's add+query. mnemonist gets 96.0% with a from-scratch Rust HNSW (3 days earlier). One script (reproduce_benchmark.sh) clones everything, builds, and runs end-to-end. All flaws documented.
//! LongMemEval retrieval benchmark — mnemonist (Apr 1, 2026, commit 3ca2c39)
//!
//! Uses the exact same methodology as MemPalace's raw-mode benchmark:
//!
//! 1. For each of 500 questions, build a per-question haystack (~48 sessions)
//! 2. Index only user turns, concatenated per session
//! 3. Embed haystack + query (fastembed/ONNX, all-MiniLM-L6-v2)
//! 4. Retrieve top-5 by cosine similarity via native HNSW index
//! 5. Score: is the gold session in the top-5?
//!