SneakyCode/docs/ROADMAP.md

# SneakyCode Implementation Roadmap

A phased plan progressing from bare-bones foundation to full autonomous coding agent.

---

## Phase 1 — Foundation: Models, Config, and Utilities

Establish the data layer and shared infrastructure everything else builds on.

| File | Description |
|------|-------------|
| `app/models/config.py` | Pydantic v2 config model — load and validate `config/config.yaml` |
| `app/models/message.py` | Message schema (role, content, tool_calls) |
| `app/models/tool_call.py` | ToolCall and ToolResult schemas |
| `app/utils/logging.py` | Centralized logger with Rich handler |
| `app/utils/display.py` | Rich console output helpers (stub — expanded in Phase 2) |
| `app/utils/file_helpers.py` | Safe path resolution, binary detection, size guards |
| `app/utils/token_counter.py` | Approximate token usage tracking (character-based heuristic for v1) |
| `app/main.py` | Entrypoint stub — arg parsing, config load, Rich console setup |

**Exit criteria:** `python -m app.main --help` runs, config loads and validates, models can be instantiated and serialized.

---

## Phase 2 — TUI and Interactive Shell

Get a working interactive terminal before wiring up the LLM.

| File | Description |
|------|-------------|
| `app/main.py` | Rich-based interactive REPL loop — prompt for user input, display responses |
| `app/utils/display.py` | Formatted output for agent messages, tool calls, errors, token usage |
| `app/agent/context.py` | Session state and conversation history management |

**Exit criteria:** User can type messages into a styled REPL, see them echoed back with formatting, and conversation history is tracked in memory.

---

## Phase 3 — LLM Integration (Ollama)

Connect to the local LLM and stream responses into the TUI.

| File | Description |
|------|-------------|
| `app/services/llm.py` | Async httpx client wrapping Ollama's OpenAI-compatible `/v1/chat/completions` endpoint |
| `app/services/streaming.py` | SSE parsing, Rich live display, tool call extraction from accumulated stream |

**Integration:** Wire LLM into the REPL — user message goes to LLM, streamed response displays in real time.

**Exit criteria:** User can chat with the local model through the TUI with streamed output. Tool call JSON is parsed from the stream but not yet executed.

---

## Phase 4 — Tool Framework and Core Tools

Build the tool abstraction and implement safe, read-only tools first.

| File | Description |
|------|-------------|
| `app/tools/base.py` | `BaseTool` ABC and `ToolResult` dataclass |
| `app/tools/registry.py` | Tool registration, discovery, and JSON schema export for LLM system prompt |
| `app/services/permissions.py` | Two-tier approval gating (auto-approve reads; prompt for writes/deletes/shell) |
| `app/tools/filesystem.py` | `read_file`, `list_dir` |
| `app/tools/search.py` | `grep_files`, `find_files` |

**Exit criteria:** Tools register themselves, schemas export correctly for inclusion in the system prompt, read-only tools execute and return `ToolResult` objects. Permissions service gates execution.

---

## Phase 5 — Agent Loop (ReAct)

The core autonomy layer — reason, act, observe, repeat.

| File | Description |
|------|-------------|
| `app/agent/loop.py` | ReAct cycle: send conversation to LLM, parse tool calls, execute, feed results back, repeat |

**Key behaviors:**
- System prompt constructed with tool schemas from registry
- Permissions checks before each tool execution
- Loop termination on: plain-text response (no tool calls), explicit `finish` tool call, or `max_iterations` exceeded

**Exit criteria:** Agent can autonomously answer questions about the codebase by chaining `read_file`, `list_dir`, `grep_files`, and `find_files` tool calls in a multi-turn loop.

---

## Phase 6 — Write Tools and Shell

Unlock the agent's ability to modify code and run commands.

| File | Description |
|------|-------------|
| `app/tools/filesystem.py` | `write_file`, `make_dir`, `delete_file` (additions to existing module) |
| `app/tools/edit.py` | `str_replace` (unique-match required), `patch_apply` |
| `app/tools/shell.py` | `run_command` with command allow/deny lists and output truncation |

**All write/shell operations gated through permissions service.**

**Exit criteria:** Agent can autonomously create files, edit code via string replacement, and run shell commands — all with user approval for destructive operations.

---

## Phase 7 — Polish and Hardening

Production-readiness: error handling, resource limits, and documentation.

| Area | Description |
|------|-------------|
| Error handling | Recovery from malformed tool calls, LLM errors, network timeouts in agent loop |
| Token budget | Conversation truncation or summarization when approaching context limit |
| Graceful shutdown | Clean Ctrl+C handling, session state preservation |
| Testing | End-to-end integration tests (`tests/integration/`), unit tests (`tests/unit/`) |
| Documentation | `README.md` with setup and usage instructions, `docs/tools.md` tool reference |

**Exit criteria:** Agent handles edge cases gracefully, tests pass, and a new user can set up and use the project from the README alone.

---

## File Coverage

Every file from the project structure in CLAUDE.md is accounted for:

| File | Phase |
|------|-------|
| `app/main.py` | 1, 2 |
| `app/models/config.py` | 1 |
| `app/models/message.py` | 1 |
| `app/models/tool_call.py` | 1 |
| `app/utils/logging.py` | 1 |
| `app/utils/display.py` | 1, 2 |
| `app/utils/file_helpers.py` | 1 |
| `app/utils/token_counter.py` | 1 |
| `app/agent/context.py` | 2 |
| `app/services/llm.py` | 3 |
| `app/services/streaming.py` | 3 |
| `app/tools/base.py` | 4 |
| `app/tools/registry.py` | 4 |
| `app/services/permissions.py` | 4 |
| `app/tools/filesystem.py` | 4, 6 |
| `app/tools/search.py` | 4 |
| `app/agent/loop.py` | 5 |
| `app/tools/edit.py` | 6 |
| `app/tools/shell.py` | 6 |