fix: empty response handling, /no_think model gating, per-model profiles
- Detect empty LLM responses (no content, no tool calls) instead of silently treating them as task completion. Retries once without tools before warning the user. - Gate /no_think system message and chat_template_kwargs to Qwen/QwQ models only — sending /no_think to llama3.x caused empty responses. - Add model_profiles config section for per-model overrides (token budget, thinking, temperature, max_tokens) matched by name prefix. Applied at startup and on /model switch. - Update SessionManager on /model switch so session files record the correct model. - Add NDJSON fallback in SSE stream parser for Ollama compatibility. - Improve read_file error to suggest find_files on FileNotFoundError. - Add diagnostic logging for empty streams and empty results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -36,6 +36,11 @@ class TokenCounter:
|
||||
"""The configured token budget."""
|
||||
return self._budget
|
||||
|
||||
@budget.setter
|
||||
def budget(self, value: int) -> None:
|
||||
"""Update the token budget (e.g., when switching models)."""
|
||||
self._budget = value
|
||||
|
||||
@property
|
||||
def cumulative_usage(self) -> TokenUsage:
|
||||
"""Cumulative token usage across all tracked calls."""
|
||||
|
||||
Reference in New Issue
Block a user