SneakyCode Implementation Roadmap

A phased plan progressing from bare-bones foundation to full autonomous coding agent.

Phase 1 — Foundation: Models, Config, and Utilities

Establish the data layer and shared infrastructure everything else builds on.

File	Description
`app/models/config.py`	Pydantic v2 config model — load and validate `config/config.yaml`
`app/models/message.py`	Message schema (role, content, tool_calls)
`app/models/tool_call.py`	ToolCall and ToolResult schemas
`app/utils/logging.py`	Centralized logger with Rich handler
`app/utils/display.py`	Rich console output helpers (stub — expanded in Phase 2)
`app/utils/file_helpers.py`	Safe path resolution, binary detection, size guards
`app/utils/token_counter.py`	Approximate token usage tracking (character-based heuristic for v1)
`app/main.py`	Entrypoint stub — arg parsing, config load, Rich console setup

Exit criteria: python -m app.main --help runs, config loads and validates, models can be instantiated and serialized.

Phase 2 — TUI and Interactive Shell

Get a working interactive terminal before wiring up the LLM.

File	Description
`app/main.py`	Rich-based interactive REPL loop — prompt for user input, display responses
`app/utils/display.py`	Formatted output for agent messages, tool calls, errors, token usage
`app/agent/context.py`	Session state and conversation history management

Exit criteria: User can type messages into a styled REPL, see them echoed back with formatting, and conversation history is tracked in memory.

Phase 3 — LLM Integration (Ollama)

Connect to the local LLM and stream responses into the TUI.

File	Description
`app/services/llm.py`	Async httpx client wrapping Ollama's OpenAI-compatible `/v1/chat/completions` endpoint
`app/services/streaming.py`	SSE parsing, Rich live display, tool call extraction from accumulated stream

Integration: Wire LLM into the REPL — user message goes to LLM, streamed response displays in real time.

Exit criteria: User can chat with the local model through the TUI with streamed output. Tool call JSON is parsed from the stream but not yet executed.

Phase 4 — Tool Framework and Core Tools

Build the tool abstraction and implement safe, read-only tools first.

File	Description
`app/tools/base.py`	`BaseTool` ABC and `ToolResult` dataclass
`app/tools/registry.py`	Tool registration, discovery, and JSON schema export for LLM system prompt
`app/services/permissions.py`	Two-tier approval gating (auto-approve reads; prompt for writes/deletes/shell)
`app/tools/filesystem.py`	`read_file`, `list_dir`
`app/tools/search.py`	`grep_files`, `find_files`

Exit criteria: Tools register themselves, schemas export correctly for inclusion in the system prompt, read-only tools execute and return ToolResult objects. Permissions service gates execution.

Phase 5 — Agent Loop (ReAct)

The core autonomy layer — reason, act, observe, repeat.

File	Description
`app/agent/loop.py`	ReAct cycle: send conversation to LLM, parse tool calls, execute, feed results back, repeat

Key behaviors:

System prompt constructed with tool schemas from registry
Permissions checks before each tool execution
Loop termination on: plain-text response (no tool calls), explicit finish tool call, or max_iterations exceeded

Exit criteria: Agent can autonomously answer questions about the codebase by chaining read_file, list_dir, grep_files, and find_files tool calls in a multi-turn loop.

Phase 6 — Write Tools and Shell

Unlock the agent's ability to modify code and run commands.

File	Description
`app/tools/filesystem.py`	`write_file`, `make_dir`, `delete_file` (additions to existing module)
`app/tools/edit.py`	`str_replace` (unique-match required), `patch_apply`
`app/tools/shell.py`	`run_command` with command allow/deny lists and output truncation

All write/shell operations gated through permissions service.

Exit criteria: Agent can autonomously create files, edit code via string replacement, and run shell commands — all with user approval for destructive operations.

Phase 7 — Polish and Hardening

Production-readiness: error handling, resource limits, and documentation.

Area	Description
Error handling	Recovery from malformed tool calls, LLM errors, network timeouts in agent loop
Token budget	Conversation truncation or summarization when approaching context limit
Graceful shutdown	Clean Ctrl+C handling, session state preservation
Testing	End-to-end integration tests (`tests/integration/`), unit tests (`tests/unit/`)
Documentation	`README.md` with setup and usage instructions, `docs/tools.md` tool reference

Exit criteria: Agent handles edge cases gracefully, tests pass, and a new user can set up and use the project from the README alone.

File Coverage

Every file from the project structure in CLAUDE.md is accounted for:

File	Phase
`app/main.py`	1, 2
`app/models/config.py`	1
`app/models/message.py`	1
`app/models/tool_call.py`	1
`app/utils/logging.py`	1
`app/utils/display.py`	1, 2
`app/utils/file_helpers.py`	1
`app/utils/token_counter.py`	1
`app/agent/context.py`	2
`app/services/llm.py`	3
`app/services/streaming.py`	3
`app/tools/base.py`	4
`app/tools/registry.py`	4
`app/services/permissions.py`	4
`app/tools/filesystem.py`	4, 6
`app/tools/search.py`	4
`app/agent/loop.py`	5
`app/tools/edit.py`	6
`app/tools/shell.py`	6

5.8 KiB Raw Blame History