Adds `llm.thinking` config option (default: true) that when disabled:
- Injects /no_think into the last user message for Qwen 3.x compatibility
- Sends chat_template_kwargs in API payload for backends that support it
- Silently and immediately nudges on reasoning-only responses instead of
showing warnings and wasting retry iterations
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allows passing arbitrary parameters (e.g., enable_thinking, reasoning_effort)
to the LLM API request body via config.yaml, solving reasoning-only response
loops with models like Qwen 3.x without requiring code changes per model.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement the core autonomy layer — AgentLoop streams LLM responses,
parses tool calls, executes them with permission checks, feeds results
back, and repeats until the task completes or finish is called.
- Add FinishTool for explicit loop termination
- Add tools parameter to LLMClient.stream_chat() for function calling
- Add compact tool result display (status line, not full output)
- Refactor REPL to delegate to AgentLoop.run_turn()
- Fix Ollama null content rejection (always send content as string)
- Add finish to auto_approve permissions
- 9 unit tests for agent loop (34 total, zero regressions)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wire the REPL to a local Ollama instance via streaming HTTP (SSE).
LLMClient handles async streaming chat, StreamHandler renders live
Markdown via Rich and accumulates tool call fragments. Startup now
runs a preflight check that verifies Ollama is reachable and the
configured model is pulled, exiting with a clear message on failure.
Also adds .gitignore and updates config to use qwen3.5.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>