# High-level view ``` [ Browser / Web GUI ] │ HTTPS (HTTP/3) via Cloudflare (DNS/WAF/CDN) ▼ [ Caddy API Gateway ] — routing, TLS, real client IP, SSE/WebSocket pass-through │ ├── /auth/* → [ Auth Service (Appwrite) ] ├── /api/* → [ Game API (Flask) ] │ ├── calls → [ AI-DM Service (Flask) → Replicate ] │ └── calls → [ Embeddings Service (Flask) ] │ └── KNN over pgvector │ ├── presign → direct upload/download ↔ [ Appwrite ] │ └── infra cache / rate limits ↔ [ Redis ] ┌─────────────────────────────────────────────┐ │ │ │ [ Postgres 16 + pgvector ] │ │ (auth, game OLTP + semantic vectors) │ └─────────────────────────────────────────────┘ ``` --- ## Services & responsibilities * **Web GUI** * Player UX for auth, character management, sessions, chat. * Uses REST for CRUD; SSE/WebSocket for live DM replies/typing. * **Caddy API Gateway** * Edge routing for `/auth`, `/api`, `/ai`, `/vec`. * TLS termination behind Cloudflare; preserves real client IP; gzip/br. * Pass-through for SSE/WebSocket; access logging. * **Auth Service (Flask)** * Registration, login, refresh; JWT issuance/validation. * Owns player identity and credentials. * Simple rate limits via Redis. * **Game API (Flask)** * Core game domain (characters, sessions, inventory, rules orchestration). * Persists messages; orchestrates retrieval and AI calls. * Streams DM replies to clients (SSE/WebSocket). * Generates pre-signed URLs for Garage uploads/downloads. * **AI-DM Service (Flask)** * Thin, deterministic wrapper around **Replicate** models (prompt shaping, retries, timeouts). * Optional async path via job queue if responses are slow. * **Embeddings Service (Flask)** * Text → vector embedding (chosen model) and vector writes. * KNN search API (top-K over `pgvector`) for context retrieval. * Manages embedding version/dimension; supports re-embed workflows. * **Postgres 16 + pgvector** * Single source of truth for auth & game schemas. * Stores messages with `vector` column; IVF/HNSW index for similarity. * **Garage (S3-compatible)** * Object storage for player assets (character sheets, images, exports). * Access via pre-signed URLs (private buckets by default). * **Redis** * Caching hot reads (recent messages/session state). * Rate limiting tokens; optional Dramatiq broker for long jobs. --- ## Data boundaries * **Auth schema (Postgres)** * `players(id, email, password_hash, created_at, …)` * Service: **Auth** exclusively reads/writes; others read via Auth or JWT claims. * **Game schema (Postgres)** * `characters(id, player_id, name, clazz, level, sheet_json, …)` * `sessions(id, player_id, title, created_at, …)` * `messages(id, session_id, role, content, embedding vector(…)=NULL, created_at, …)` * Indices: * `messages(session_id, created_at)` * `messages USING hnsw|ivfflat (embedding vector_cosine_ops)` * **Objects (Garage)** * Buckets: `player-assets`, `exports`, etc. * Keys include tenant/player and content hashes; metadata stored in DB. * **Cache/queues (Redis)** * Keys for rate limits, short-lived session state, optional job queues. --- ## Core request flows ### A) Player message → DM reply (sync POC) 1. Web GUI → `POST /api/sessions/{id}/messages` (JWT). 2. **Game API** writes player message (content only). 3. **Embeddings Service** returns vector → **Game API** updates message.embedding. 4. **Embeddings Service** (or direct SQL) performs KNN to fetch top-K prior messages. 5. **Game API** calls **AI-DM Service** with `{prompt, context, system}`. 6. **AI-DM** calls **Replicate**, returns text. 7. **Game API** writes DM message (+ embedding), emits SSE/WebSocket event to client. ### B) Asset upload (character sheet/map) 1. Web GUI → `POST /api/assets/presign {bucket, key, contentType}` (JWT). 2. **Game API** validates ACLs → returns pre-signed PUT URL for **Garage**. 3. Browser uploads directly to **Garage**. 4. **Game API** records/updates asset metadata row (owner, key, checksum, type). ### C) Authentication 1. Web GUI → **Auth** `POST /auth/register` / `POST /auth/login`. 2. **Auth** returns `{access, refresh}` JWTs. 3. Subsequent API calls include access token (Caddy passes through). ### D) Retrieval-augmented turn (refine/search only) 1. **Game API** (server-side) computes query embedding for player prompt. 2. KNN over `messages.embedding` returns top-K context. 3. Context trimmed/serialized and sent to **AI-DM Service**. 4. Reply streamed back to client; transcripts persisted. ### E) Long/slow generations (async job queue) 1. **Game API** enqueues job (Redis/Dramatiq) to **AI-DM**. 2. Returns `{job_id}`; Web GUI subscribes via SSE. 3. Worker completes → **Game API** writes DM message and emits event. This keeps each service small and focused, leans on Flask everywhere, uses **Caddy + Cloudflare** at the edge, **Postgres + pgvector** for state and search, and **Garage** for durable assets—with clean seams to swap pieces as you scale.