feat: add thinking mode toggle to suppress reasoning-only response loops

Adds `llm.thinking` config option (default: true) that when disabled: - Injects /no_think into the last user message for Qwen 3.x compatibility - Sends chat_template_kwargs in API payload for backends that support it - Silently and immediately nudges on reasoning-only responses instead of showing warnings and wasting retry iterations Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 19:34:36 -05:00
parent 25fa7dc82b
commit f0d8ef8f0a
4 changed files with 46 additions and 11 deletions
--- a/app/models/config.py
+++ b/app/models/config.py
@@ -20,6 +20,10 @@ class LLMConfig(BaseModel):
    max_retries: int = Field(default=3, description="Max retry attempts on transient errors")
    retry_backoff_base: float = Field(default=1.0, description="Base seconds for exponential backoff")
    retry_backoff_max: float = Field(default=30.0, description="Maximum backoff seconds")
+    thinking: bool = Field(
+        default=True,
+        description="Enable model thinking/reasoning mode (disable to reduce reasoning-only loops)",
+    )
    extra_body: dict[str, Any] = Field(
        default_factory=dict,
        description="Extra parameters merged into the API request body (model-specific)",