feat: add thinking mode toggle to suppress reasoning-only response loops
Adds `llm.thinking` config option (default: true) that when disabled: - Injects /no_think into the last user message for Qwen 3.x compatibility - Sends chat_template_kwargs in API payload for backends that support it - Silently and immediately nudges on reasoning-only responses instead of showing warnings and wasting retry iterations Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -20,6 +20,10 @@ class LLMConfig(BaseModel):
|
||||
max_retries: int = Field(default=3, description="Max retry attempts on transient errors")
|
||||
retry_backoff_base: float = Field(default=1.0, description="Base seconds for exponential backoff")
|
||||
retry_backoff_max: float = Field(default=30.0, description="Maximum backoff seconds")
|
||||
thinking: bool = Field(
|
||||
default=True,
|
||||
description="Enable model thinking/reasoning mode (disable to reduce reasoning-only loops)",
|
||||
)
|
||||
extra_body: dict[str, Any] = Field(
|
||||
default_factory=dict,
|
||||
description="Extra parameters merged into the API request body (model-specific)",
|
||||
|
||||
Reference in New Issue
Block a user