You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DFlash speculative decoding for Qwen3-Coder-Next on DGX Spark — 2-line vLLM patch, 88-108 tok/s
Qwen3 Coder Next + DFlash on DGX Spark: 108 tok/s with a 2 line vLLM patch
I got DFlash speculative decoding working with Qwen3 Coder Next on my DGX Spark (GB10, 128 GB unified memory). Result: 88-108 tok/s depending on task complexity, up from 62 tok/s without DFlash.
Tool calling works too (--enable-auto-tool-choice --tool-call-parser qwen3_coder), tested at 89 tok/s with DFlash active. Useful if you're running coding agents that need function calls.
The fix turned out to be surprisingly simple: two lines of Python.
JavaScript snippet to export a ChatGPT conversation from the web UI to a clean Markdown file, with correct user/assistant attribution, code block preservation, and basic media placeholders. Designed to be run directly in the browser console (Safari/Chrome/Firefox).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters