first commit

2025-11-24 23:10:55 -06:00
commit 8315fa51c9
279 changed files with 74600 additions and 0 deletions
--- a/api/docs/AI_INTEGRATION.md
+++ b/api/docs/AI_INTEGRATION.md
@@ -0,0 +1,538 @@
+# AI Integration Documentation
+
+## Overview
+
+Code of Conquest uses AI models for narrative generation through a unified Replicate API integration. This document covers the AI client architecture, model selection, and usage patterns.
+
+**Key Components:**
+- **ReplicateClient** - Low-level API client for all AI models
+- **ModelSelector** - Tier-based model routing and configuration
+- **NarrativeGenerator** - High-level wrapper for game-specific generation
+
+---
+
+## Architecture
+
+```
+┌─────────────────────┐
+│  NarrativeGenerator │  ← High-level game API
+├─────────────────────┤
+│   ModelSelector     │  ← Tier/context routing
+├─────────────────────┤
+│   ReplicateClient   │  ← Unified API client
+├─────────────────────┤
+│   Replicate API     │  ← All models (Llama, Claude)
+└─────────────────────┘
+```
+
+All AI models are accessed through Replicate API for unified billing and management.
+
+---
+
+## Replicate Client
+
+**File:** `app/ai/replicate_client.py`
+
+### Supported Models
+
+| Model Type | Identifier | Tier | Use Case |
+|------------|-----------|------|----------|
+| `LLAMA_3_8B` | `meta/meta-llama-3-8b-instruct` | Free | Cost-effective, good quality |
+| `CLAUDE_HAIKU` | `anthropic/claude-3.5-haiku` | Basic | Fast, high quality |
+| `CLAUDE_SONNET` | `anthropic/claude-3.5-sonnet` | Premium | Excellent quality |
+| `CLAUDE_SONNET_4` | `anthropic/claude-4.5-sonnet` | Elite | Best quality |
+
+### Basic Usage
+
+```python
+from app.ai.replicate_client import ReplicateClient, ModelType
+
+# Free tier - Llama (default)
+client = ReplicateClient()
+response = client.generate(
+    prompt="You are a dungeon master...",
+    max_tokens=256,
+    temperature=0.7
+)
+print(response.text)
+print(f"Tokens: {response.tokens_used}")
+
+# Paid tier - Claude models
+client = ReplicateClient(model=ModelType.CLAUDE_HAIKU)
+response = client.generate(
+    prompt="Describe the tavern",
+    system_prompt="You are a dungeon master"
+)
+
+# Override model per-call
+response = client.generate("Test", model=ModelType.CLAUDE_SONNET)
+```
+
+### Response Object
+
+```python
+@dataclass
+class ReplicateResponse:
+    text: str              # Generated text
+    tokens_used: int       # Approximate token count
+    model: str             # Model identifier
+    generation_time: float # Generation time in seconds
+```
+
+### Configuration
+
+```python
+# Default parameters
+DEFAULT_MAX_TOKENS = 256
+DEFAULT_TEMPERATURE = 0.7
+DEFAULT_TOP_P = 0.9
+DEFAULT_TIMEOUT = 30  # seconds
+
+# Model-specific defaults
+MODEL_DEFAULTS = {
+    ModelType.LLAMA_3_8B: {"max_tokens": 256, "temperature": 0.7},
+    ModelType.CLAUDE_HAIKU: {"max_tokens": 512, "temperature": 0.8},
+    ModelType.CLAUDE_SONNET: {"max_tokens": 1024, "temperature": 0.9},
+    ModelType.CLAUDE_SONNET_4: {"max_tokens": 2048, "temperature": 0.9},
+}
+```
+
+### Error Handling
+
+```python
+from app.ai.replicate_client import (
+    ReplicateClientError,    # Base error
+    ReplicateAPIError,       # API errors
+    ReplicateRateLimitError, # Rate limiting
+    ReplicateTimeoutError    # Timeouts
+)
+
+try:
+    response = client.generate(prompt)
+except ReplicateRateLimitError:
+    # Handle rate limiting (client retries automatically 3 times)
+    pass
+except ReplicateTimeoutError:
+    # Handle timeout
+    pass
+except ReplicateAPIError as e:
+    # Handle other API errors
+    logger.error(f"API error: {e}")
+```
+
+### Features
+
+- **Retry Logic**: Exponential backoff (3 retries) for rate limits
+- **Model-specific Formatting**: Llama special tokens, Claude system prompts
+- **API Key Validation**: `client.validate_api_key()` method
+
+---
+
+## Model Selector
+
+**File:** `app/ai/model_selector.py`
+
+### User Tiers
+
+```python
+class UserTier(str, Enum):
+    FREE = "free"       # Llama 3 8B
+    BASIC = "basic"     # Claude Haiku
+    PREMIUM = "premium" # Claude Sonnet
+    ELITE = "elite"     # Claude Sonnet 4
+```
+
+### Context Types
+
+```python
+class ContextType(str, Enum):
+    STORY_PROGRESSION = "story_progression"  # Creative narratives
+    COMBAT_NARRATION = "combat_narration"    # Action descriptions
+    QUEST_SELECTION = "quest_selection"      # Quest picking
+    NPC_DIALOGUE = "npc_dialogue"            # Character conversations
+    SIMPLE_RESPONSE = "simple_response"      # Quick responses
+```
+
+### Usage
+
+```python
+from app.ai.model_selector import ModelSelector, UserTier, ContextType
+
+selector = ModelSelector()
+
+# Select model configuration
+config = selector.select_model(
+    user_tier=UserTier.PREMIUM,
+    context_type=ContextType.STORY_PROGRESSION
+)
+
+print(config.model_type)   # ModelType.CLAUDE_SONNET
+print(config.max_tokens)   # 1024
+print(config.temperature)  # 0.9
+```
+
+### Token Limits by Tier
+
+| Tier | Base Tokens | Model |
+|------|-------------|-------|
+| FREE | 256 | Llama 3 8B |
+| BASIC | 512 | Claude Haiku |
+| PREMIUM | 1024 | Claude Sonnet |
+| ELITE | 2048 | Claude Sonnet 4 |
+
+### Context Adjustments
+
+**Temperature by Context:**
+- Story Progression: 0.9 (creative)
+- Combat Narration: 0.8 (exciting)
+- Quest Selection: 0.5 (deterministic)
+- NPC Dialogue: 0.85 (natural)
+- Simple Response: 0.7 (balanced)
+
+**Token Multipliers:**
+- Story Progression: 1.0× (full allocation)
+- Combat Narration: 0.75× (shorter)
+- Quest Selection: 0.5× (brief)
+- NPC Dialogue: 0.75× (conversational)
+- Simple Response: 0.5× (quick)
+
+### Cost Estimation
+
+```python
+# Get tier information
+info = selector.get_tier_info(UserTier.PREMIUM)
+# {
+#     "tier": "premium",
+#     "model": "anthropic/claude-3.5-sonnet",
+#     "model_name": "Claude 3.5 Sonnet",
+#     "base_tokens": 1024,
+#     "quality": "Excellent quality, detailed narratives"
+# }
+
+# Estimate cost per request
+cost = selector.estimate_cost_per_request(UserTier.PREMIUM)
+# ~$0.009 per request
+```
+
+---
+
+## Narrative Generator
+
+**File:** `app/ai/narrative_generator.py`
+
+High-level wrapper that coordinates model selection, prompt templates, and AI generation.
+
+### Initialization
+
+```python
+from app.ai.narrative_generator import NarrativeGenerator
+from app.ai.model_selector import UserTier
+
+generator = NarrativeGenerator()
+```
+
+### Story Response Generation
+
+```python
+response = generator.generate_story_response(
+    character={
+        "name": "Aldric",
+        "level": 3,
+        "player_class": "Fighter",
+        "stats": {"strength": 16, "dexterity": 14, ...}
+    },
+    action="I search the room for hidden doors",
+    game_state={
+        "current_location": "Ancient Library",
+        "location_type": "DUNGEON",
+        "active_quests": ["find_artifact"]
+    },
+    user_tier=UserTier.PREMIUM,
+    conversation_history=[
+        {"turn": 1, "action": "entered library", "dm_response": "..."},
+        {"turn": 2, "action": "examined shelves", "dm_response": "..."}
+    ],
+    action_instructions="""
+        The player searches for supplies. This means:
+        - Describe what they FIND, not auto-purchase
+        - List items with PRICES if applicable
+        - Ask what they want to do with findings
+    """  # Optional: from action_prompts.yaml dm_prompt_template
+)
+
+print(response.narrative)
+print(f"Tokens: {response.tokens_used}")
+print(f"Model: {response.model}")
+print(f"Time: {response.generation_time:.2f}s")
+```
+
+### Action Instructions
+
+The `action_instructions` parameter passes action-specific guidance from `action_prompts.yaml` to the AI. This ensures:
+
+1. **Player agency** - AI presents options rather than making decisions
+2. **Action semantics** - "Search" means find options, not auto-buy
+3. **Context-aware responses** - Different instructions for different actions
+
+The instructions are injected into the prompt template and include critical player agency rules:
+- Never auto-purchase items
+- Never complete transactions without consent
+- Present choices and ask what they want to do
+
+### Combat Narration
+
+```python
+response = generator.generate_combat_narration(
+    character={"name": "Aldric", ...},
+    combat_state={
+        "round_number": 3,
+        "enemies": [{"name": "Goblin", "hp": 5, "max_hp": 10}],
+        "terrain": "cave"
+    },
+    action="swings their sword at the goblin",
+    action_result={
+        "hit": True,
+        "damage": 12,
+        "effects": ["bleeding"]
+    },
+    user_tier=UserTier.BASIC,
+    is_critical=True,
+    is_finishing_blow=True
+)
+```
+
+### Quest Selection
+
+```python
+quest_id = generator.generate_quest_selection(
+    character={"name": "Aldric", "level": 3, ...},
+    eligible_quests=[
+        {"quest_id": "goblin_cave", "name": "Clear the Cave", ...},
+        {"quest_id": "herb_gathering", "name": "Gather Herbs", ...}
+    ],
+    game_context={
+        "current_location": "Tavern",
+        "recent_events": ["talked to locals"]
+    },
+    user_tier=UserTier.FREE,
+    recent_actions=["asked about rumors", "ordered ale"]
+)
+print(quest_id)  # "goblin_cave"
+```
+
+### NPC Dialogue
+
+```python
+response = generator.generate_npc_dialogue(
+    character={"name": "Aldric", ...},
+    npc={
+        "name": "Old Barkeep",
+        "role": "Tavern Owner",
+        "personality": "gruff but kind"
+    },
+    conversation_topic="What rumors have you heard lately?",
+    game_state={"current_location": "The Rusty Anchor", ...},
+    user_tier=UserTier.PREMIUM,
+    npc_knowledge=["goblin attacks", "missing merchant"]
+)
+```
+
+### Response Object
+
+```python
+@dataclass
+class NarrativeResponse:
+    narrative: str       # Generated text
+    tokens_used: int     # Token count
+    model: str           # Model used
+    context_type: str    # Type of generation
+    generation_time: float
+```
+
+### Error Handling
+
+```python
+from app.ai.narrative_generator import NarrativeGeneratorError
+
+try:
+    response = generator.generate_story_response(...)
+except NarrativeGeneratorError as e:
+    logger.error(f"Generation failed: {e}")
+    # Handle gracefully (show error to user, use fallback, etc.)
+```
+
+---
+
+## Prompt Templates
+
+**File:** `app/ai/prompt_templates.py`
+**Templates:** `app/ai/templates/*.j2`
+
+### Available Templates
+
+1. **story_action.j2** - Story progression turns
+2. **combat_action.j2** - Combat narration
+3. **quest_offering.j2** - Context-aware quest selection
+4. **npc_dialogue.j2** - NPC conversations
+
+### Template Filters
+
+- `format_inventory` - Format item lists
+- `format_stats` - Format character stats
+- `format_skills` - Format skill lists
+- `format_effects` - Format active effects
+- `truncate_text` - Truncate with ellipsis
+- `format_gold` - Format currency
+
+### Direct Template Usage
+
+```python
+from app.ai.prompt_templates import get_prompt_templates
+
+templates = get_prompt_templates()
+
+prompt = templates.render(
+    "story_action.j2",
+    character={"name": "Aldric", ...},
+    action="search for traps",
+    game_state={...},
+    conversation_history=[...]
+)
+```
+
+---
+
+## Configuration
+
+### Environment Variables
+
+```bash
+# Required
+REPLICATE_API_TOKEN=r8_...
+
+# Optional (defaults shown)
+REPLICATE_MODEL=meta/meta-llama-3-8b-instruct
+```
+
+### Cost Management
+
+Approximate costs per 1K tokens:
+
+| Model | Input | Output |
+|-------|-------|--------|
+| Llama 3 8B | Free | Free |
+| Claude Haiku | $0.001 | $0.005 |
+| Claude Sonnet | $0.003 | $0.015 |
+| Claude Sonnet 4 | $0.015 | $0.075 |
+
+---
+
+## Integration with Background Jobs
+
+AI generation runs asynchronously via RQ jobs. See `app/tasks/ai_tasks.py`.
+
+```python
+from app.tasks.ai_tasks import enqueue_ai_task
+
+# Queue a story action
+job = enqueue_ai_task(
+    task_type="narrative",
+    user_id="user_123",
+    context={
+        "session_id": "sess_789",
+        "character_id": "char_456",
+        "action": "I explore the tavern"
+    }
+)
+# Returns: {"job_id": "abc-123", "status": "queued"}
+```
+
+---
+
+## Usage Tracking
+
+All AI calls are automatically logged for cost monitoring. See `app/services/usage_tracking_service.py`.
+
+```python
+from app.services.usage_tracking_service import UsageTrackingService
+
+tracker = UsageTrackingService()
+
+# Get daily usage
+usage = tracker.get_daily_usage("user_123", date.today())
+print(f"Requests: {usage.total_requests}")
+print(f"Cost: ${usage.estimated_cost:.4f}")
+
+# Get monthly cost
+monthly = tracker.get_monthly_cost("user_123", 2025, 11)
+```
+
+---
+
+## Rate Limiting
+
+Tier-based daily limits enforced via `app/services/rate_limiter_service.py`.
+
+### AI Calls (Turns)
+
+| Tier | Daily Limit |
+|------|------------|
+| FREE | 20 turns |
+| BASIC | 50 turns |
+| PREMIUM | 100 turns |
+| ELITE | 200 turns |
+
+### Custom Actions
+
+Free-text player actions (beyond preset buttons) have separate limits:
+
+| Tier | Custom Actions/Day | Max Characters |
+|------|-------------------|----------------|
+| FREE | 10 | 150 |
+| BASIC | 50 | 300 |
+| PREMIUM | Unlimited | 500 |
+| ELITE | Unlimited | 500 |
+
+These are configurable in `config/*.yaml` under `rate_limiting.tiers.{tier}.custom_actions_per_day` and `custom_action_char_limit`.
+
+```python
+from app.services.rate_limiter_service import RateLimiterService
+
+limiter = RateLimiterService()
+
+try:
+    limiter.check_rate_limit("user_123", UserTier.PREMIUM)
+    # Process request...
+    limiter.increment_usage("user_123")
+except RateLimitExceeded as e:
+    # Return error to user
+    pass
+```
+
+---
+
+## Best Practices
+
+1. **Always specify context type** - Helps optimize token usage and temperature
+2. **Provide conversation history** - Improves narrative coherence
+3. **Handle errors gracefully** - Show user-friendly messages
+4. **Monitor costs** - Use usage tracking service
+5. **Test with mocks first** - Use mocked clients during development
+
+---
+
+## Verification Scripts
+
+- `scripts/verify_ai_models.py` - Test model routing and API connectivity
+- `scripts/verify_e2e_ai_generation.py` - End-to-end generation flow tests
+
+```bash
+# Test model routing (no API key needed)
+python scripts/verify_ai_models.py
+
+# Test with real API calls
+python scripts/verify_ai_models.py --llama --haiku --sonnet
+
+# Full E2E test
+python scripts/verify_e2e_ai_generation.py --real --tier premium
+```