Code_of_Conquest/docs/VECTOR_DATABASE_STRATEGY.md

# Vector Database Strategy

## Overview

This document outlines the strategy for implementing layered knowledge systems using vector databases to provide NPCs and the Dungeon Master with contextual lore, regional history, and world knowledge.

**Status:** Planning Phase
**Last Updated:** November 26, 2025
**Decision:** Use Weaviate for vector database implementation

---

## Knowledge Hierarchy

### Three-Tier Vector Database Structure

1. **World Lore DB** (Global)
   - Broad historical events, mythology, major kingdoms, legendary figures
   - Accessible to all NPCs and DM for player questions
   - Examples: "The Great War 200 years ago", "The origin of magic", "The Five Kingdoms"
   - **Scope:** Universal knowledge any educated NPC might know

2. **Regional/Town Lore DB** (Location-specific)
   - Local history, notable events, landmarks, politics, rumors
   - Current town leadership, recent events, local legends
   - Trade routes, neighboring settlements, regional conflicts
   - **Scope:** Knowledge specific to geographic area

3. **NPC Persona** (Individual, YAML-defined)
   - Personal background, personality, motivations
   - Specific knowledge based on profession/role
   - Personal relationships and secrets
   - **Scope:** Character-specific information (already implemented in `/api/app/data/npcs/*.yaml`)

---

## How Knowledge Layers Work Together

### Contextual Knowledge Layering

When an NPC engages in conversation, build their knowledge context by:
- **Always include**: NPC persona + their region's lore DB
- **Conditionally include**: World lore (if the topic seems historical/broad)
- **Use semantic search**: Query each DB for relevant chunks based on conversation topic

### Example Interaction Flow

**Player asks tavern keeper:** "Tell me about the old ruins north of town"

1. Check NPC persona: "Are ruins mentioned in their background?"
2. Query Regional DB: "old ruins + north + [town name]"
3. If no hits, query World Lore DB: "ancient ruins + [region name]"
4. Combine results with NPC personality filter

**Result:** NPC responds with appropriate lore, or authentically says "I don't know about that" if nothing is found.

---

## Knowledge Boundaries & Authenticity

### NPCs Have Knowledge Limitations Based On:

- **Profession**: Blacksmith knows metallurgy lore, scholar knows history, farmer knows agricultural traditions
- **Social Status**: Nobles know court politics, commoners know street rumors
- **Age/Experience**: Elder NPCs might reference events from decades ago
- **Travel History**: Has this NPC been outside their region?

### Implementation of "I don't know"

Add metadata to vector DB entries:
- `required_profession: ["scholar", "priest"]`
- `social_class: ["noble", "merchant"]`
- `knowledge_type: "academic" | "common" | "secret"`
- `region_id: "thornhelm"`
- `time_period: "ancient" | "recent" | "current"`

Filter results before passing to the NPC's AI context, allowing authentic "I haven't heard of that" responses.

---

## Retrieval-Augmented Generation (RAG) Pattern

### Building AI Prompts for NPC Dialogue

```
[NPC Persona from YAML]
+
[Top 3-5 relevant chunks from Regional DB based on conversation topic]
+
[Top 2-3 relevant chunks from World Lore if topic is broad/historical]
+
[Conversation history from character's npc_interactions]
→ Feed to Claude with instruction to stay in character and admit ignorance if uncertain
```

### DM Knowledge vs NPC Knowledge

**DM Mode** (Player talks directly to DM, not through NPC):
- DM has access to ALL databases without restrictions
- DM can reveal as much or as little as narratively appropriate
- DM can generate content not in databases (creative liberty)

**NPC Mode** (Player talks to specific NPC):
- NPC knowledge filtered by persona/role/location
- NPC can redirect: "You should ask the town elder about that" or "I've heard scholars at the university know more"
- Creates natural quest hooks and information-gathering gameplay

---

## Technical Implementation

### Technology Choice: Weaviate

**Reasons for Weaviate:**
- Self-hosted option for dev/beta
- Managed cloud service (Weaviate Cloud Services) for production
- **Same API** for both self-hosted and managed (easy migration)
- Rich metadata filtering capabilities
- Multi-tenancy support
- GraphQL API (fits strong typing preference)
- Hybrid search (semantic + keyword)

### Storage & Indexing Strategy

**Where Each DB Lives:**

- **World Lore**: Single global vector DB collection
- **Regional DBs**: One collection with region metadata filtering
  - Could use Weaviate multi-tenancy for efficient isolation
  - Lazy-load when character enters region
  - Cache in Redis for active sessions
- **NPC Personas**: Remain in YAML (structured data, not semantic search needed)

**Weaviate Collections Structure:**

```
Collections:
- WorldLore
  - Metadata: knowledge_type, time_period, required_profession
- RegionalLore
  - Metadata: region_id, knowledge_type, social_class
- Rumors (optional: dynamic/time-sensitive content)
  - Metadata: region_id, expiration_date, source_npc
```

### Semantic Chunk Strategy

Chunk lore content by logical units:
- **Events**: "The Battle of Thornhelm (Year 1204) - A decisive victory..."
- **Locations**: "The Abandoned Lighthouse - Once a beacon for traders..."
- **Figures**: "Lord Varric the Stern - Current ruler of Thornhelm..."
- **Rumors/Gossip**: "Strange lights have been seen in the forest lately..."

Each chunk gets embedded and stored with rich metadata for filtering.

---

## Development Workflow

### Index-Once Strategy

**Rationale:**
- Lore is relatively static (updates only during major version releases)
- Read-heavy workload (perfect for vector DBs)
- Cost-effective (one-time embedding generation)
- Allows thorough testing before deployment

### Workflow Phases

**Development:**
1. Write lore content (YAML/JSON/Markdown)
2. Run embedding script locally
3. Upload to local Weaviate instance (Docker)
4. Test NPC conversations
5. Iterate on lore content

**Beta/Staging:**
1. Same self-hosted Weaviate, separate instance
2. Finalize lore content
3. Generate production embeddings
4. Performance testing

**Production:**
1. Migrate to Weaviate Cloud Services
2. Upload final embedded lore
3. Players query read-only
4. No changes until next major update

### Self-Hosted Development Setup

**Docker Compose Example:**

```yaml
services:
  weaviate:
    image: semitechnologies/weaviate:latest
    ports:
      - "8080:8080"
    environment:
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'  # Dev only
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
    volumes:
      - weaviate_data:/var/lib/weaviate
```

**Hardware Requirements (Self-Hosted):**
- RAM: 4-8GB sufficient for beta
- CPU: Low (no heavy re-indexing)
- Storage: Minimal (vectors are compact)

---

## Migration Path: Dev → Production

### Zero-Code Migration

1. Export data from self-hosted Weaviate (backup tools)
2. Create Weaviate Cloud Services cluster
3. Import data to WCS
4. Change environment variable: `WEAVIATE_URL`
5. Deploy code (no code changes required)

**Environment Configuration:**

```yaml
# /api/config/development.yaml
weaviate:
  url: "http://localhost:8080"
  api_key: null

# /api/config/production.yaml
weaviate:
  url: "https://your-cluster.weaviate.network"
  api_key: "${WEAVIATE_API_KEY}"  # From .env
```

---

## Embedding Strategy

### One-Time Embedding Generation

Since embeddings are generated once per release, prioritize **quality over cost**.

**Embedding Model Options:**

| Model | Pros | Cons | Recommendation |
|-------|------|------|----------------|
| OpenAI `text-embedding-3-large` | High quality, good semantic understanding | Paid per use | **Production** |
| Cohere Embed v3 | Optimized for search, multilingual | Paid per use | **Production Alternative** |
| sentence-transformers (OSS) | Free, self-host, fast iteration | Lower quality | **Development/Testing** |

**Recommendation:**
- **Development:** Use open-source models (iterate faster, zero cost)
- **Production:** Use OpenAI or Replicate https://replicate.com/beautyyuyanli/multilingual-e5-large (quality matters for player experience)

### Embedding Generation Script

Will be implemented in `/api/scripts/generate_lore_embeddings.py`:
1. Read lore files (YAML/JSON/Markdown)
2. Chunk content appropriately
3. Generate embeddings using chosen model
4. Upload to Weaviate with metadata
5. Validate retrieval quality

---

## Content Management

### Lore Content Structure

**Storage Location:** `/api/app/data/lore/`

```
/api/app/data/lore/
  world/
    history.yaml
    mythology.yaml
    kingdoms.yaml
  regions/
    thornhelm/
      history.yaml
      locations.yaml
      rumors.yaml
    silverwood/
      history.yaml
      locations.yaml
      rumors.yaml
```

**Example Lore Entry (YAML):**

```yaml
- id: "thornhelm_founding"
  title: "The Founding of Thornhelm"
  content: |
    Thornhelm was founded in the year 847 by Lord Theron the Bold,
    a retired general seeking to establish a frontier town...
  metadata:
    region_id: "thornhelm"
    knowledge_type: "common"
    time_period: "historical"
    required_profession: null  # Anyone can know this
    social_class: null  # All classes
  tags:
    - "founding"
    - "lord-theron"
    - "history"
```

### Version Control for Lore Updates

**Complete Re-Index Strategy** (Simplest, recommended):
1. Delete old collections during maintenance window
2. Upload new lore with embeddings
3. Atomic cutover
4. Works great for infrequent major updates

**Alternative: Versioned Collections** (Overkill for our use case):
- `WorldLore_v1`, `WorldLore_v2`
- More overhead, probably unnecessary

---

## Performance & Cost Optimization

### Cost Considerations

**Embedding Generation:**
- One-time cost per lore chunk
- Only re-generate during major updates
- Estimated cost: $X per 1000 chunks (TBD based on model choice)

**Vector Search:**
- No embedding cost for queries (just retrieval)
- Self-hosted: Infrastructure cost only
- Managed (WCS): Pay for storage + queries

**Optimization Strategies:**
- Pre-compute all embeddings at build time
- Cache frequently accessed regional DBs in Redis
- Only search World Lore DB if regional search returns no results (fallback pattern)
- Use cheaper embedding models for non-critical content

### Retrieval Performance

**Expected Query Times:**
- Semantic search: < 100ms
- With metadata filtering: < 150ms
- Hybrid search: < 200ms

**Caching Strategy:**
- Cache top N regional lore chunks per active region in Redis
- TTL: 1 hour (or until session ends)
- Invalidate on major lore updates

---

## Multiplayer Considerations

### Shared World State

If multiple characters are in the same town talking to NPCs:
- **Regional DB**: Shared (same lore for everyone)
- **World DB**: Shared
- **NPC Interactions**: Character-specific (stored in `character.npc_interactions`)

**Result:** NPCs can reference world events consistently across players while maintaining individual relationships.

---

## Testing Strategy

### Validation Steps

1. **Retrieval Quality Testing**
   - Does semantic search return relevant lore?
   - Are metadata filters working correctly?
   - Do NPCs find appropriate information?

2. **NPC Knowledge Boundaries**
   - Can a farmer access academic knowledge? (Should be filtered out)
   - Do profession filters work as expected?
   - Do NPCs authentically say "I don't know" when appropriate?

3. **Performance Testing**
   - Query response times under load
   - Cache hit rates
   - Memory usage with multiple active regions

4. **Content Quality**
   - Is lore consistent across databases?
   - Are there contradictions between world/regional lore?
   - Is chunk size appropriate for context?

---

## Implementation Phases

### Phase 1: Proof of Concept (Current)
- [ ] Set up local Weaviate with Docker
- [ ] Create sample lore chunks (20-30 entries for one town)
- [ ] Generate embeddings and upload to Weaviate
- [ ] Build simple API endpoint for querying Weaviate
- [ ] Test NPC conversation with lore augmentation

### Phase 2: Core Implementation
- [ ] Define lore content structure (YAML schema)
- [ ] Write lore for starter region
- [ ] Implement embedding generation script
- [ ] Create Weaviate service layer in `/api/app/services/weaviate_service.py`
- [ ] Integrate with NPC conversation system
- [ ] Add DM lore query endpoints

### Phase 3: Content Expansion
- [ ] Write world lore content
- [ ] Write lore for additional regions
- [ ] Implement knowledge filtering logic
- [ ] Add lore discovery system (optional: player codex)

### Phase 4: Production Readiness
- [ ] Migrate to Weaviate Cloud Services
- [ ] Performance optimization and caching
- [ ] Backup and disaster recovery
- [ ] Monitoring and alerting

---

## Open Questions

1. **Authoring Tools**: How will we create/maintain lore content efficiently?
   - Manual YAML editing?
   - AI-generated lore with human review?
   - Web-based CMS?

2. **Lore Discovery**: Should players unlock lore entries (codex-style) as they learn about them?
   - Could be fun for completionists
   - Adds gameplay loop around exploration

3. **Dynamic Lore**: How to handle time-sensitive rumors or evolving world state?
   - Separate "Rumors" collection with expiration dates?
   - Regional events that trigger new lore entries?

4. **Chunk Size**: What's optimal for context vs. precision?
   - Too small: NPCs miss broader context
   - Too large: Less precise retrieval
   - Needs testing to determine

5. **Consistency Validation**: How to ensure regional lore doesn't contradict world lore?
   - Automated consistency checks?
   - Manual review process?
   - Lore versioning and dependency tracking?

---

## Future Enhancements

- **Player-Generated Lore**: Allow DMs to add custom lore entries during sessions
- **Lore Relationships**: Graph connections between related lore entries
- **Multilingual Support**: Embed lore in multiple languages
- **Seasonal/Event Lore**: Time-based lore that appears during special events
- **Quest Integration**: Automatic lore unlock based on quest completion

---

## References

- **Weaviate Documentation**: https://weaviate.io/developers/weaviate
- **RAG Pattern Best Practices**: (TBD)
- **Embedding Model Comparisons**: (TBD)

---

## Notes

This strategy aligns with the project's core principles:
- **Strong typing**: Lore models will use dataclasses
- **Configuration-driven**: Lore content in YAML/JSON
- **Microservices architecture**: Weaviate is independent service
- **Cost-conscious**: Index-once strategy minimizes ongoing costs
- **Future-proof**: Easy migration from self-hosted to managed