fix: empty response handling, /no_think model gating, per-model profiles

- Detect empty LLM responses (no content, no tool calls) instead of silently treating them as task completion. Retries once without tools before warning the user. - Gate /no_think system message and chat_template_kwargs to Qwen/QwQ models only — sending /no_think to llama3.x caused empty responses. - Add model_profiles config section for per-model overrides (token budget, thinking, temperature, max_tokens) matched by name prefix. Applied at startup and on /model switch. - Update SessionManager on /model switch so session files record the correct model. - Add NDJSON fallback in SSE stream parser for Ollama compatibility. - Improve read_file error to suggest find_files on FileNotFoundError. - Add diagnostic logging for empty streams and empty results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 23:09:04 -05:00
parent 1ee721ac10
commit 16d79df421
10 changed files with 191 additions and 33 deletions
--- a/app/services/session.py
+++ b/app/services/session.py
@@ -52,6 +52,10 @@ class SessionManager:
        self._session_dir = workspace_root / config.session_dir
        self._session_id = f"{self._workspace_hash}_{datetime.now(UTC).strftime('%Y%m%d_%H%M%S')}"

+    def update_model(self, model: str) -> None:
+        """Update the model name for session metadata."""
+        self._model = model
+
    def save(self, ctx: "SessionContext") -> Path:
        """Save session state to a JSON file via atomic write.