feat: add thinking mode toggle to suppress reasoning-only response loops

Adds `llm.thinking` config option (default: true) that when disabled:
- Injects /no_think into the last user message for Qwen 3.x compatibility
- Sends chat_template_kwargs in API payload for backends that support it
- Silently and immediately nudges on reasoning-only responses instead of
  showing warnings and wasting retry iterations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-11 19:34:36 -05:00
parent 25fa7dc82b
commit f0d8ef8f0a
4 changed files with 46 additions and 11 deletions

View File

@@ -20,6 +20,10 @@ class LLMConfig(BaseModel):
max_retries: int = Field(default=3, description="Max retry attempts on transient errors")
retry_backoff_base: float = Field(default=1.0, description="Base seconds for exponential backoff")
retry_backoff_max: float = Field(default=30.0, description="Maximum backoff seconds")
thinking: bool = Field(
default=True,
description="Enable model thinking/reasoning mode (disable to reduce reasoning-only loops)",
)
extra_body: dict[str, Any] = Field(
default_factory=dict,
description="Extra parameters merged into the API request body (model-specific)",