Chad Gibson ccgibson

## reasoning_preservation_compare_v2.ts
// Reasoning preservation test — Neuralwatt vs Moonshot AI (via OpenRouter).
// Now covers BOTH Kimi (preserve_thinking) and GLM (clear_thinking) families.
//
// Minimal modifications from the customer's original script:
//   1. Tests both K2.6 and GLM-5.1 (same probe, family-correct kwarg name).
//   2. Sends chat_template_kwargs.<kwarg> so we actually exercise the opt-in.
//   3. Runs N trials per variant (default 5) so a single sample landing on the
//      model's "refuse to repeat secret" mode doesn't cause a false negative.
//   4. Scores content AND reasoning separately so we can distinguish
//      "template didn't preserve it" (neither has 'arnold') from

## reasoning_preservation_compare.py
#!/usr/bin/env python3
"""Compare reasoning preservation: Neuralwatt vs Moonshot AI (via OpenRouter).

Sends the same chat-completion request to two endpoints back-to-back and
reports how each handles a prior-turn assistant ``reasoning`` field on the
"no tool calls" + trailing-user shape (Variant B in our docs / your tests).

The two endpoints are configured to be as apples-to-apples as possible:

  Neuralwatt:  POST https://api.neuralwatt.com/v1/chat/completions
	// Reasoning preservation test — Neuralwatt vs Moonshot AI (via OpenRouter).
	// Now covers BOTH Kimi (preserve_thinking) and GLM (clear_thinking) families.
	//
	// Minimal modifications from the customer's original script:
	// 1. Tests both K2.6 and GLM-5.1 (same probe, family-correct kwarg name).
	// 2. Sends chat_template_kwargs.<kwarg> so we actually exercise the opt-in.
	// 3. Runs N trials per variant (default 5) so a single sample landing on the
	// model's "refuse to repeat secret" mode doesn't cause a false negative.
	// 4. Scores content AND reasoning separately so we can distinguish
	// "template didn't preserve it" (neither has 'arnold') from
	#!/usr/bin/env python3
	"""Compare reasoning preservation: Neuralwatt vs Moonshot AI (via OpenRouter).

	Sends the same chat-completion request to two endpoints back-to-back and
	reports how each handles a prior-turn assistant ``reasoning`` field on the
	"no tool calls" + trailing-user shape (Variant B in our docs / your tests).

	The two endpoints are configured to be as apples-to-apples as possible:

	Neuralwatt: POST https://api.neuralwatt.com/v1/chat/completions