# Vector Database Strategy ## Overview This document outlines the strategy for implementing layered knowledge systems using vector databases to provide NPCs and the Dungeon Master with contextual lore, regional history, and world knowledge. **Status:** Planning Phase **Last Updated:** November 26, 2025 **Decision:** Use Weaviate for vector database implementation --- ## Knowledge Hierarchy ### Three-Tier Vector Database Structure 1. **World Lore DB** (Global) - Broad historical events, mythology, major kingdoms, legendary figures - Accessible to all NPCs and DM for player questions - Examples: "The Great War 200 years ago", "The origin of magic", "The Five Kingdoms" - **Scope:** Universal knowledge any educated NPC might know 2. **Regional/Town Lore DB** (Location-specific) - Local history, notable events, landmarks, politics, rumors - Current town leadership, recent events, local legends - Trade routes, neighboring settlements, regional conflicts - **Scope:** Knowledge specific to geographic area 3. **NPC Persona** (Individual, YAML-defined) - Personal background, personality, motivations - Specific knowledge based on profession/role - Personal relationships and secrets - **Scope:** Character-specific information (already implemented in `/api/app/data/npcs/*.yaml`) --- ## How Knowledge Layers Work Together ### Contextual Knowledge Layering When an NPC engages in conversation, build their knowledge context by: - **Always include**: NPC persona + their region's lore DB - **Conditionally include**: World lore (if the topic seems historical/broad) - **Use semantic search**: Query each DB for relevant chunks based on conversation topic ### Example Interaction Flow **Player asks tavern keeper:** "Tell me about the old ruins north of town" 1. Check NPC persona: "Are ruins mentioned in their background?" 2. Query Regional DB: "old ruins + north + [town name]" 3. If no hits, query World Lore DB: "ancient ruins + [region name]" 4. Combine results with NPC personality filter **Result:** NPC responds with appropriate lore, or authentically says "I don't know about that" if nothing is found. --- ## Knowledge Boundaries & Authenticity ### NPCs Have Knowledge Limitations Based On: - **Profession**: Blacksmith knows metallurgy lore, scholar knows history, farmer knows agricultural traditions - **Social Status**: Nobles know court politics, commoners know street rumors - **Age/Experience**: Elder NPCs might reference events from decades ago - **Travel History**: Has this NPC been outside their region? ### Implementation of "I don't know" Add metadata to vector DB entries: - `required_profession: ["scholar", "priest"]` - `social_class: ["noble", "merchant"]` - `knowledge_type: "academic" | "common" | "secret"` - `region_id: "thornhelm"` - `time_period: "ancient" | "recent" | "current"` Filter results before passing to the NPC's AI context, allowing authentic "I haven't heard of that" responses. --- ## Retrieval-Augmented Generation (RAG) Pattern ### Building AI Prompts for NPC Dialogue ``` [NPC Persona from YAML] + [Top 3-5 relevant chunks from Regional DB based on conversation topic] + [Top 2-3 relevant chunks from World Lore if topic is broad/historical] + [Conversation history from character's npc_interactions] → Feed to Claude with instruction to stay in character and admit ignorance if uncertain ``` ### DM Knowledge vs NPC Knowledge **DM Mode** (Player talks directly to DM, not through NPC): - DM has access to ALL databases without restrictions - DM can reveal as much or as little as narratively appropriate - DM can generate content not in databases (creative liberty) **NPC Mode** (Player talks to specific NPC): - NPC knowledge filtered by persona/role/location - NPC can redirect: "You should ask the town elder about that" or "I've heard scholars at the university know more" - Creates natural quest hooks and information-gathering gameplay --- ## Technical Implementation ### Technology Choice: Weaviate **Reasons for Weaviate:** - Self-hosted option for dev/beta - Managed cloud service (Weaviate Cloud Services) for production - **Same API** for both self-hosted and managed (easy migration) - Rich metadata filtering capabilities - Multi-tenancy support - GraphQL API (fits strong typing preference) - Hybrid search (semantic + keyword) ### Storage & Indexing Strategy **Where Each DB Lives:** - **World Lore**: Single global vector DB collection - **Regional DBs**: One collection with region metadata filtering - Could use Weaviate multi-tenancy for efficient isolation - Lazy-load when character enters region - Cache in Redis for active sessions - **NPC Personas**: Remain in YAML (structured data, not semantic search needed) **Weaviate Collections Structure:** ``` Collections: - WorldLore - Metadata: knowledge_type, time_period, required_profession - RegionalLore - Metadata: region_id, knowledge_type, social_class - Rumors (optional: dynamic/time-sensitive content) - Metadata: region_id, expiration_date, source_npc ``` ### Semantic Chunk Strategy Chunk lore content by logical units: - **Events**: "The Battle of Thornhelm (Year 1204) - A decisive victory..." - **Locations**: "The Abandoned Lighthouse - Once a beacon for traders..." - **Figures**: "Lord Varric the Stern - Current ruler of Thornhelm..." - **Rumors/Gossip**: "Strange lights have been seen in the forest lately..." Each chunk gets embedded and stored with rich metadata for filtering. --- ## Development Workflow ### Index-Once Strategy **Rationale:** - Lore is relatively static (updates only during major version releases) - Read-heavy workload (perfect for vector DBs) - Cost-effective (one-time embedding generation) - Allows thorough testing before deployment ### Workflow Phases **Development:** 1. Write lore content (YAML/JSON/Markdown) 2. Run embedding script locally 3. Upload to local Weaviate instance (Docker) 4. Test NPC conversations 5. Iterate on lore content **Beta/Staging:** 1. Same self-hosted Weaviate, separate instance 2. Finalize lore content 3. Generate production embeddings 4. Performance testing **Production:** 1. Migrate to Weaviate Cloud Services 2. Upload final embedded lore 3. Players query read-only 4. No changes until next major update ### Self-Hosted Development Setup **Docker Compose Example:** ```yaml services: weaviate: image: semitechnologies/weaviate:latest ports: - "8080:8080" environment: AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true' # Dev only PERSISTENCE_DATA_PATH: '/var/lib/weaviate' volumes: - weaviate_data:/var/lib/weaviate ``` **Hardware Requirements (Self-Hosted):** - RAM: 4-8GB sufficient for beta - CPU: Low (no heavy re-indexing) - Storage: Minimal (vectors are compact) --- ## Migration Path: Dev → Production ### Zero-Code Migration 1. Export data from self-hosted Weaviate (backup tools) 2. Create Weaviate Cloud Services cluster 3. Import data to WCS 4. Change environment variable: `WEAVIATE_URL` 5. Deploy code (no code changes required) **Environment Configuration:** ```yaml # /api/config/development.yaml weaviate: url: "http://localhost:8080" api_key: null # /api/config/production.yaml weaviate: url: "https://your-cluster.weaviate.network" api_key: "${WEAVIATE_API_KEY}" # From .env ``` --- ## Embedding Strategy ### One-Time Embedding Generation Since embeddings are generated once per release, prioritize **quality over cost**. **Embedding Model Options:** | Model | Pros | Cons | Recommendation | |-------|------|------|----------------| | OpenAI `text-embedding-3-large` | High quality, good semantic understanding | Paid per use | **Production** | | Cohere Embed v3 | Optimized for search, multilingual | Paid per use | **Production Alternative** | | sentence-transformers (OSS) | Free, self-host, fast iteration | Lower quality | **Development/Testing** | **Recommendation:** - **Development:** Use open-source models (iterate faster, zero cost) - **Production:** Use OpenAI or Replicate https://replicate.com/beautyyuyanli/multilingual-e5-large (quality matters for player experience) ### Embedding Generation Script Will be implemented in `/api/scripts/generate_lore_embeddings.py`: 1. Read lore files (YAML/JSON/Markdown) 2. Chunk content appropriately 3. Generate embeddings using chosen model 4. Upload to Weaviate with metadata 5. Validate retrieval quality --- ## Content Management ### Lore Content Structure **Storage Location:** `/api/app/data/lore/` ``` /api/app/data/lore/ world/ history.yaml mythology.yaml kingdoms.yaml regions/ thornhelm/ history.yaml locations.yaml rumors.yaml silverwood/ history.yaml locations.yaml rumors.yaml ``` **Example Lore Entry (YAML):** ```yaml - id: "thornhelm_founding" title: "The Founding of Thornhelm" content: | Thornhelm was founded in the year 847 by Lord Theron the Bold, a retired general seeking to establish a frontier town... metadata: region_id: "thornhelm" knowledge_type: "common" time_period: "historical" required_profession: null # Anyone can know this social_class: null # All classes tags: - "founding" - "lord-theron" - "history" ``` ### Version Control for Lore Updates **Complete Re-Index Strategy** (Simplest, recommended): 1. Delete old collections during maintenance window 2. Upload new lore with embeddings 3. Atomic cutover 4. Works great for infrequent major updates **Alternative: Versioned Collections** (Overkill for our use case): - `WorldLore_v1`, `WorldLore_v2` - More overhead, probably unnecessary --- ## Performance & Cost Optimization ### Cost Considerations **Embedding Generation:** - One-time cost per lore chunk - Only re-generate during major updates - Estimated cost: $X per 1000 chunks (TBD based on model choice) **Vector Search:** - No embedding cost for queries (just retrieval) - Self-hosted: Infrastructure cost only - Managed (WCS): Pay for storage + queries **Optimization Strategies:** - Pre-compute all embeddings at build time - Cache frequently accessed regional DBs in Redis - Only search World Lore DB if regional search returns no results (fallback pattern) - Use cheaper embedding models for non-critical content ### Retrieval Performance **Expected Query Times:** - Semantic search: < 100ms - With metadata filtering: < 150ms - Hybrid search: < 200ms **Caching Strategy:** - Cache top N regional lore chunks per active region in Redis - TTL: 1 hour (or until session ends) - Invalidate on major lore updates --- ## Multiplayer Considerations ### Shared World State If multiple characters are in the same town talking to NPCs: - **Regional DB**: Shared (same lore for everyone) - **World DB**: Shared - **NPC Interactions**: Character-specific (stored in `character.npc_interactions`) **Result:** NPCs can reference world events consistently across players while maintaining individual relationships. --- ## Testing Strategy ### Validation Steps 1. **Retrieval Quality Testing** - Does semantic search return relevant lore? - Are metadata filters working correctly? - Do NPCs find appropriate information? 2. **NPC Knowledge Boundaries** - Can a farmer access academic knowledge? (Should be filtered out) - Do profession filters work as expected? - Do NPCs authentically say "I don't know" when appropriate? 3. **Performance Testing** - Query response times under load - Cache hit rates - Memory usage with multiple active regions 4. **Content Quality** - Is lore consistent across databases? - Are there contradictions between world/regional lore? - Is chunk size appropriate for context? --- ## Implementation Phases ### Phase 1: Proof of Concept (Current) - [ ] Set up local Weaviate with Docker - [ ] Create sample lore chunks (20-30 entries for one town) - [ ] Generate embeddings and upload to Weaviate - [ ] Build simple API endpoint for querying Weaviate - [ ] Test NPC conversation with lore augmentation ### Phase 2: Core Implementation - [ ] Define lore content structure (YAML schema) - [ ] Write lore for starter region - [ ] Implement embedding generation script - [ ] Create Weaviate service layer in `/api/app/services/weaviate_service.py` - [ ] Integrate with NPC conversation system - [ ] Add DM lore query endpoints ### Phase 3: Content Expansion - [ ] Write world lore content - [ ] Write lore for additional regions - [ ] Implement knowledge filtering logic - [ ] Add lore discovery system (optional: player codex) ### Phase 4: Production Readiness - [ ] Migrate to Weaviate Cloud Services - [ ] Performance optimization and caching - [ ] Backup and disaster recovery - [ ] Monitoring and alerting --- ## Open Questions 1. **Authoring Tools**: How will we create/maintain lore content efficiently? - Manual YAML editing? - AI-generated lore with human review? - Web-based CMS? 2. **Lore Discovery**: Should players unlock lore entries (codex-style) as they learn about them? - Could be fun for completionists - Adds gameplay loop around exploration 3. **Dynamic Lore**: How to handle time-sensitive rumors or evolving world state? - Separate "Rumors" collection with expiration dates? - Regional events that trigger new lore entries? 4. **Chunk Size**: What's optimal for context vs. precision? - Too small: NPCs miss broader context - Too large: Less precise retrieval - Needs testing to determine 5. **Consistency Validation**: How to ensure regional lore doesn't contradict world lore? - Automated consistency checks? - Manual review process? - Lore versioning and dependency tracking? --- ## Future Enhancements - **Player-Generated Lore**: Allow DMs to add custom lore entries during sessions - **Lore Relationships**: Graph connections between related lore entries - **Multilingual Support**: Embed lore in multiple languages - **Seasonal/Event Lore**: Time-based lore that appears during special events - **Quest Integration**: Automatic lore unlock based on quest completion --- ## References - **Weaviate Documentation**: https://weaviate.io/developers/weaviate - **RAG Pattern Best Practices**: (TBD) - **Embedding Model Comparisons**: (TBD) --- ## Notes This strategy aligns with the project's core principles: - **Strong typing**: Lore models will use dataclasses - **Configuration-driven**: Lore content in YAML/JSON - **Microservices architecture**: Weaviate is independent service - **Cost-conscious**: Index-once strategy minimizes ongoing costs - **Future-proof**: Easy migration from self-hosted to managed