feat: add thinking mode toggle to suppress reasoning-only response loops

Adds `llm.thinking` config option (default: true) that when disabled: - Injects /no_think into the last user message for Qwen 3.x compatibility - Sends chat_template_kwargs in API payload for backends that support it - Silently and immediately nudges on reasoning-only responses instead of showing warnings and wasting retry iterations Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 19:34:36 -05:00
parent 25fa7dc82b
commit f0d8ef8f0a
4 changed files with 46 additions and 11 deletions
--- a/app/services/llm.py
+++ b/app/services/llm.py
@@ -151,7 +151,11 @@ class LLMClient:
        if tools:
            payload["tools"] = tools

-        # Merge model-specific extra parameters (e.g., enable_thinking, reasoning_effort)
+        # When thinking is disabled, inject chat_template_kwargs for backends that support it
+        if not self._config.thinking:
+            payload.setdefault("chat_template_kwargs", {})["enable_thinking"] = False
+
+        # Merge model-specific extra parameters (e.g., reasoning_effort)
        if self._config.extra_body:
            payload.update(self._config.extra_body)