feat: add thinking mode toggle to suppress reasoning-only response loops

Adds `llm.thinking` config option (default: true) that when disabled:
- Injects /no_think into the last user message for Qwen 3.x compatibility
- Sends chat_template_kwargs in API payload for backends that support it
- Silently and immediately nudges on reasoning-only responses instead of
  showing warnings and wasting retry iterations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-11 19:34:36 -05:00
parent 25fa7dc82b
commit f0d8ef8f0a
4 changed files with 46 additions and 11 deletions

View File

@@ -151,7 +151,11 @@ class LLMClient:
if tools:
payload["tools"] = tools
# Merge model-specific extra parameters (e.g., enable_thinking, reasoning_effort)
# When thinking is disabled, inject chat_template_kwargs for backends that support it
if not self._config.thinking:
payload.setdefault("chat_template_kwargs", {})["enable_thinking"] = False
# Merge model-specific extra parameters (e.g., reasoning_effort)
if self._config.extra_body:
payload.update(self._config.extra_body)