fix: empty response handling, /no_think model gating, per-model profiles

- Detect empty LLM responses (no content, no tool calls) instead of
  silently treating them as task completion. Retries once without tools
  before warning the user.
- Gate /no_think system message and chat_template_kwargs to Qwen/QwQ
  models only — sending /no_think to llama3.x caused empty responses.
- Add model_profiles config section for per-model overrides (token
  budget, thinking, temperature, max_tokens) matched by name prefix.
  Applied at startup and on /model switch.
- Update SessionManager on /model switch so session files record the
  correct model.
- Add NDJSON fallback in SSE stream parser for Ollama compatibility.
- Improve read_file error to suggest find_files on FileNotFoundError.
- Add diagnostic logging for empty streams and empty results.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-11 23:09:04 -05:00
parent 1ee721ac10
commit 16d79df421
10 changed files with 191 additions and 33 deletions

View File

@@ -36,6 +36,11 @@ class TokenCounter:
"""The configured token budget."""
return self._budget
@budget.setter
def budget(self, value: int) -> None:
"""Update the token budget (e.g., when switching models)."""
self._budget = value
@property
def cumulative_usage(self) -> TokenUsage:
"""Cumulative token usage across all tracked calls."""