5.9 KiB
5.9 KiB
High-level view
[ Browser / Web GUI ]
│ HTTPS (HTTP/3) via Cloudflare (DNS/WAF/CDN)
▼
[ Caddy API Gateway ] — routing, TLS, real client IP, SSE/WebSocket pass-through
│
├── /auth/* → [ Auth Service (Appwrite) ]
├── /api/* → [ Game API (Flask) ]
│ ├── calls → [ AI-DM Service (Flask) → Replicate ]
│ └── calls → [ Embeddings Service (Flask) ]
│ └── KNN over pgvector
│
├── presign → direct upload/download ↔ [ Appwrite ]
│
└── infra cache / rate limits ↔ [ Redis ]
┌─────────────────────────────────────────────┐
│ │
│ [ Postgres 16 + pgvector ] │
│ (auth, game OLTP + semantic vectors) │
└─────────────────────────────────────────────┘
Front end / Back end auth
[Frontend] ── login(email,pass) ─▶ [API Gateway] ─▶ [Auth Service]
│ │
│ <── Set-Cookie: access_token=JWT ───┘
│
├─▶ call /game/start (cookie auto-attached)
│
└─▶ logout → clear cookie
Services & responsibilities
-
Web GUI
- Player UX for auth, character management, sessions, chat.
- Uses REST for CRUD; SSE/WebSocket for live DM replies/typing.
-
Caddy API Gateway
- Edge routing for
/auth,/api,/ai,/vec. - TLS termination behind Cloudflare; preserves real client IP; gzip/br.
- Pass-through for SSE/WebSocket; access logging.
- Edge routing for
-
Auth Service (Flask)
- Registration, login, refresh; JWT issuance/validation.
- Owns player identity and credentials.
- Simple rate limits via Redis.
-
Game API (Flask)
- Core game domain (characters, sessions, inventory, rules orchestration).
- Persists messages; orchestrates retrieval and AI calls.
- Streams DM replies to clients (SSE/WebSocket).
- Generates pre-signed URLs for Garage uploads/downloads.
-
AI-DM Service (Flask)
- Thin, deterministic wrapper around Replicate models (prompt shaping, retries, timeouts).
- Optional async path via job queue if responses are slow.
-
Embeddings Service (Flask)
- Text → vector embedding (chosen model) and vector writes.
- KNN search API (top-K over
pgvector) for context retrieval. - Manages embedding version/dimension; supports re-embed workflows.
-
Postgres 16 + pgvector
- Single source of truth for auth & game schemas.
- Stores messages with
vectorcolumn; IVF/HNSW index for similarity.
-
Garage (S3-compatible)
- Object storage for player assets (character sheets, images, exports).
- Access via pre-signed URLs (private buckets by default).
-
Redis
- Caching hot reads (recent messages/session state).
- Rate limiting tokens; optional Dramatiq broker for long jobs.
Data boundaries
-
Auth schema (Postgres)
players(id, email, password_hash, created_at, …)- Service: Auth exclusively reads/writes; others read via Auth or JWT claims.
-
Game schema (Postgres)
-
characters(id, player_id, name, clazz, level, sheet_json, …) -
sessions(id, player_id, title, created_at, …) -
messages(id, session_id, role, content, embedding vector(…)=NULL, created_at, …) -
Indices:
messages(session_id, created_at)messages USING hnsw|ivfflat (embedding vector_cosine_ops)
-
-
Objects (Garage)
- Buckets:
player-assets,exports, etc. - Keys include tenant/player and content hashes; metadata stored in DB.
- Buckets:
-
Cache/queues (Redis)
- Keys for rate limits, short-lived session state, optional job queues.
Core request flows
A) Player message → DM reply (sync POC)
- Web GUI →
POST /api/sessions/{id}/messages(JWT). - Game API writes player message (content only).
- Embeddings Service returns vector → Game API updates message.embedding.
- Embeddings Service (or direct SQL) performs KNN to fetch top-K prior messages.
- Game API calls AI-DM Service with
{prompt, context, system}. - AI-DM calls Replicate, returns text.
- Game API writes DM message (+ embedding), emits SSE/WebSocket event to client.
B) Asset upload (character sheet/map)
- Web GUI →
POST /api/assets/presign {bucket, key, contentType}(JWT). - Game API validates ACLs → returns pre-signed PUT URL for Garage.
- Browser uploads directly to Garage.
- Game API records/updates asset metadata row (owner, key, checksum, type).
C) Authentication
- Web GUI → Auth
POST /auth/register/POST /auth/login. - Auth returns
{access, refresh}JWTs. - Subsequent API calls include access token (Caddy passes through).
D) Retrieval-augmented turn (refine/search only)
- Game API (server-side) computes query embedding for player prompt.
- KNN over
messages.embeddingreturns top-K context. - Context trimmed/serialized and sent to AI-DM Service.
- Reply streamed back to client; transcripts persisted.
E) Long/slow generations (async job queue)
- Game API enqueues job (Redis/Dramatiq) to AI-DM.
- Returns
{job_id}; Web GUI subscribes via SSE. - Worker completes → Game API writes DM message and emits event.
This keeps each service small and focused, leans on Flask everywhere, uses Caddy + Cloudflare at the edge, Postgres + pgvector for state and search, and Garage for durable assets—with clean seams to swap pieces as you scale.