Skip to content

Instantly share code, notes, and snippets.

@rockinyp
rockinyp / mlx_vlm_thinking_format_patch.py
Created February 26, 2026 03:29
Patches for thinking models (Qwen3, DeepSeek R1) with MLX-VLM and Open WebUI RAG
"""
MLX-VLM Thinking Format Patch
=============================
Problem: Qwen3.5 (and similar thinking models) output "Thinking Process:\n...\n</think>"
instead of proper <think>...</think> tags when served via mlx_vlm.server.
This breaks UIs like Open WebUI that expect standard <think> tag format.
Fix: Add a transform function and apply it to both streaming and non-streaming responses.