Skip to content

Instantly share code, notes, and snippets.

@Kompas
Kompas / gist-README.md
Last active April 15, 2026 20:49
SGLang + DFlash on DGX Spark (Qwen3-Coder-Next NVFP4) — 150 tok/s

SGLang + DFlash on DGX Spark (Qwen3-Coder-Next NVFP4)

Running Qwen3-Coder-Next-NVFP4-GB10 with DFlash speculative decoding on SGLang. Tested on a Lenovo ThinkStation PGX (NVIDIA GB10 Grace Blackwell, 128 GB unified memory).

What you get

Test SGLang DFlash vLLM DFlash+Marlin Delta
Short code (307 tok) 150 tok/s 108 tok/s +38%
@Kompas
Kompas / dflash-coder-next-forum-post.md
Last active April 14, 2026 12:53
DFlash speculative decoding for Qwen3-Coder-Next on DGX Spark — 2-line vLLM patch, 88-108 tok/s

Qwen3 Coder Next + DFlash on DGX Spark: 108 tok/s with a 2 line vLLM patch

I got DFlash speculative decoding working with Qwen3 Coder Next on my DGX Spark (GB10, 128 GB unified memory). Result: 88-108 tok/s depending on task complexity, up from 62 tok/s without DFlash.

Tool calling works too (--enable-auto-tool-choice --tool-call-parser qwen3_coder), tested at 89 tok/s with DFlash active. Useful if you're running coding agents that need function calls.

The fix turned out to be surprisingly simple: two lines of Python.

The Problem

@Kompas
Kompas / chatgpt-conversation-exporter.js
Created January 7, 2026 19:41 — forked from LukasMFR/chatgpt-conversation-exporter.js
JavaScript snippet to export a ChatGPT conversation from the web UI to a clean Markdown file, with correct user/assistant attribution, code block preservation, and basic media placeholders. Designed to be run directly in the browser console (Safari/Chrome/Firefox).
(() => {
function formatDate(date = new Date()) {
return date.toISOString().split("T")[0];
}
function escapeMarkdown(text) {
return text
.replace(/\\/g, "\\\\")
.replace(/\*/g, "\\*")
.replace(/_/g, "\\_")