# AI Integration Documentation ## Overview Code of Conquest uses AI models for narrative generation through a unified Replicate API integration. This document covers the AI client architecture, model selection, and usage patterns. **Key Components:** - **ReplicateClient** - Low-level API client for all AI models - **ModelSelector** - Tier-based model routing and configuration - **NarrativeGenerator** - High-level wrapper for game-specific generation --- ## Architecture ``` ┌─────────────────────┐ │ NarrativeGenerator │ ← High-level game API ├─────────────────────┤ │ ModelSelector │ ← Tier/context routing ├─────────────────────┤ │ ReplicateClient │ ← Unified API client ├─────────────────────┤ │ Replicate API │ ← All models (Llama, Claude) └─────────────────────┘ ``` All AI models are accessed through Replicate API for unified billing and management. --- ## Replicate Client **File:** `app/ai/replicate_client.py` ### Supported Models | Model Type | Identifier | Tier | Use Case | |------------|-----------|------|----------| | `LLAMA_3_8B` | `meta/meta-llama-3-8b-instruct` | Free | Cost-effective, good quality | | `CLAUDE_HAIKU` | `anthropic/claude-3.5-haiku` | Basic | Fast, high quality | | `CLAUDE_SONNET` | `anthropic/claude-3.5-sonnet` | Premium | Excellent quality | | `CLAUDE_SONNET_4` | `anthropic/claude-4.5-sonnet` | Elite | Best quality | ### Basic Usage ```python from app.ai.replicate_client import ReplicateClient, ModelType # Free tier - Llama (default) client = ReplicateClient() response = client.generate( prompt="You are a dungeon master...", max_tokens=256, temperature=0.7 ) print(response.text) print(f"Tokens: {response.tokens_used}") # Paid tier - Claude models client = ReplicateClient(model=ModelType.CLAUDE_HAIKU) response = client.generate( prompt="Describe the tavern", system_prompt="You are a dungeon master" ) # Override model per-call response = client.generate("Test", model=ModelType.CLAUDE_SONNET) ``` ### Response Object ```python @dataclass class ReplicateResponse: text: str # Generated text tokens_used: int # Approximate token count model: str # Model identifier generation_time: float # Generation time in seconds ``` ### Configuration ```python # Default parameters DEFAULT_MAX_TOKENS = 256 DEFAULT_TEMPERATURE = 0.7 DEFAULT_TOP_P = 0.9 DEFAULT_TIMEOUT = 30 # seconds # Model-specific defaults MODEL_DEFAULTS = { ModelType.LLAMA_3_8B: {"max_tokens": 256, "temperature": 0.7}, ModelType.CLAUDE_HAIKU: {"max_tokens": 512, "temperature": 0.8}, ModelType.CLAUDE_SONNET: {"max_tokens": 1024, "temperature": 0.9}, ModelType.CLAUDE_SONNET_4: {"max_tokens": 2048, "temperature": 0.9}, } ``` ### Error Handling ```python from app.ai.replicate_client import ( ReplicateClientError, # Base error ReplicateAPIError, # API errors ReplicateRateLimitError, # Rate limiting ReplicateTimeoutError # Timeouts ) try: response = client.generate(prompt) except ReplicateRateLimitError: # Handle rate limiting (client retries automatically 3 times) pass except ReplicateTimeoutError: # Handle timeout pass except ReplicateAPIError as e: # Handle other API errors logger.error(f"API error: {e}") ``` ### Features - **Retry Logic**: Exponential backoff (3 retries) for rate limits - **Model-specific Formatting**: Llama special tokens, Claude system prompts - **API Key Validation**: `client.validate_api_key()` method --- ## Model Selector **File:** `app/ai/model_selector.py` ### User Tiers ```python class UserTier(str, Enum): FREE = "free" # Llama 3 8B BASIC = "basic" # Claude Haiku PREMIUM = "premium" # Claude Sonnet ELITE = "elite" # Claude Sonnet 4 ``` ### Context Types ```python class ContextType(str, Enum): STORY_PROGRESSION = "story_progression" # Creative narratives COMBAT_NARRATION = "combat_narration" # Action descriptions QUEST_SELECTION = "quest_selection" # Quest picking NPC_DIALOGUE = "npc_dialogue" # Character conversations SIMPLE_RESPONSE = "simple_response" # Quick responses ``` ### Usage ```python from app.ai.model_selector import ModelSelector, UserTier, ContextType selector = ModelSelector() # Select model configuration config = selector.select_model( user_tier=UserTier.PREMIUM, context_type=ContextType.STORY_PROGRESSION ) print(config.model_type) # ModelType.CLAUDE_SONNET print(config.max_tokens) # 1024 print(config.temperature) # 0.9 ``` ### Token Limits by Tier | Tier | Base Tokens | Model | |------|-------------|-------| | FREE | 256 | Llama 3 8B | | BASIC | 512 | Claude Haiku | | PREMIUM | 1024 | Claude Sonnet | | ELITE | 2048 | Claude Sonnet 4 | ### Context Adjustments **Temperature by Context:** - Story Progression: 0.9 (creative) - Combat Narration: 0.8 (exciting) - Quest Selection: 0.5 (deterministic) - NPC Dialogue: 0.85 (natural) - Simple Response: 0.7 (balanced) **Token Multipliers:** - Story Progression: 1.0× (full allocation) - Combat Narration: 0.75× (shorter) - Quest Selection: 0.5× (brief) - NPC Dialogue: 0.75× (conversational) - Simple Response: 0.5× (quick) ### Cost Estimation ```python # Get tier information info = selector.get_tier_info(UserTier.PREMIUM) # { # "tier": "premium", # "model": "anthropic/claude-3.5-sonnet", # "model_name": "Claude 3.5 Sonnet", # "base_tokens": 1024, # "quality": "Excellent quality, detailed narratives" # } # Estimate cost per request cost = selector.estimate_cost_per_request(UserTier.PREMIUM) # ~$0.009 per request ``` --- ## Narrative Generator **File:** `app/ai/narrative_generator.py` High-level wrapper that coordinates model selection, prompt templates, and AI generation. ### Initialization ```python from app.ai.narrative_generator import NarrativeGenerator from app.ai.model_selector import UserTier generator = NarrativeGenerator() ``` ### Story Response Generation ```python response = generator.generate_story_response( character={ "name": "Aldric", "level": 3, "player_class": "Fighter", "stats": {"strength": 16, "dexterity": 14, ...} }, action="I search the room for hidden doors", game_state={ "current_location": "Ancient Library", "location_type": "DUNGEON", "active_quests": ["find_artifact"] }, user_tier=UserTier.PREMIUM, conversation_history=[ {"turn": 1, "action": "entered library", "dm_response": "..."}, {"turn": 2, "action": "examined shelves", "dm_response": "..."} ], action_instructions=""" The player searches for supplies. This means: - Describe what they FIND, not auto-purchase - List items with PRICES if applicable - Ask what they want to do with findings """ # Optional: from action_prompts.yaml dm_prompt_template ) print(response.narrative) print(f"Tokens: {response.tokens_used}") print(f"Model: {response.model}") print(f"Time: {response.generation_time:.2f}s") ``` ### Action Instructions The `action_instructions` parameter passes action-specific guidance from `action_prompts.yaml` to the AI. This ensures: 1. **Player agency** - AI presents options rather than making decisions 2. **Action semantics** - "Search" means find options, not auto-buy 3. **Context-aware responses** - Different instructions for different actions The instructions are injected into the prompt template and include critical player agency rules: - Never auto-purchase items - Never complete transactions without consent - Present choices and ask what they want to do ### Combat Narration ```python response = generator.generate_combat_narration( character={"name": "Aldric", ...}, combat_state={ "round_number": 3, "enemies": [{"name": "Goblin", "hp": 5, "max_hp": 10}], "terrain": "cave" }, action="swings their sword at the goblin", action_result={ "hit": True, "damage": 12, "effects": ["bleeding"] }, user_tier=UserTier.BASIC, is_critical=True, is_finishing_blow=True ) ``` ### Quest Selection ```python quest_id = generator.generate_quest_selection( character={"name": "Aldric", "level": 3, ...}, eligible_quests=[ {"quest_id": "goblin_cave", "name": "Clear the Cave", ...}, {"quest_id": "herb_gathering", "name": "Gather Herbs", ...} ], game_context={ "current_location": "Tavern", "recent_events": ["talked to locals"] }, user_tier=UserTier.FREE, recent_actions=["asked about rumors", "ordered ale"] ) print(quest_id) # "goblin_cave" ``` ### NPC Dialogue ```python response = generator.generate_npc_dialogue( character={"name": "Aldric", ...}, npc={ "name": "Old Barkeep", "role": "Tavern Owner", "personality": "gruff but kind" }, conversation_topic="What rumors have you heard lately?", game_state={"current_location": "The Rusty Anchor", ...}, user_tier=UserTier.PREMIUM, npc_knowledge=["goblin attacks", "missing merchant"] ) ``` ### Response Object ```python @dataclass class NarrativeResponse: narrative: str # Generated text tokens_used: int # Token count model: str # Model used context_type: str # Type of generation generation_time: float ``` ### Error Handling ```python from app.ai.narrative_generator import NarrativeGeneratorError try: response = generator.generate_story_response(...) except NarrativeGeneratorError as e: logger.error(f"Generation failed: {e}") # Handle gracefully (show error to user, use fallback, etc.) ``` --- ## Prompt Templates **File:** `app/ai/prompt_templates.py` **Templates:** `app/ai/templates/*.j2` ### Available Templates 1. **story_action.j2** - Story progression turns 2. **combat_action.j2** - Combat narration 3. **quest_offering.j2** - Context-aware quest selection 4. **npc_dialogue.j2** - NPC conversations ### Template Filters - `format_inventory` - Format item lists - `format_stats` - Format character stats - `format_skills` - Format skill lists - `format_effects` - Format active effects - `truncate_text` - Truncate with ellipsis - `format_gold` - Format currency ### Direct Template Usage ```python from app.ai.prompt_templates import get_prompt_templates templates = get_prompt_templates() prompt = templates.render( "story_action.j2", character={"name": "Aldric", ...}, action="search for traps", game_state={...}, conversation_history=[...] ) ``` --- ## Configuration ### Environment Variables ```bash # Required REPLICATE_API_TOKEN=r8_... # Optional (defaults shown) REPLICATE_MODEL=meta/meta-llama-3-8b-instruct ``` ### Cost Management Approximate costs per 1K tokens: | Model | Input | Output | |-------|-------|--------| | Llama 3 8B | Free | Free | | Claude Haiku | $0.001 | $0.005 | | Claude Sonnet | $0.003 | $0.015 | | Claude Sonnet 4 | $0.015 | $0.075 | --- ## Integration with Background Jobs AI generation runs asynchronously via RQ jobs. See `app/tasks/ai_tasks.py`. ```python from app.tasks.ai_tasks import enqueue_ai_task # Queue a story action job = enqueue_ai_task( task_type="narrative", user_id="user_123", context={ "session_id": "sess_789", "character_id": "char_456", "action": "I explore the tavern" } ) # Returns: {"job_id": "abc-123", "status": "queued"} ``` --- ## Usage Tracking All AI calls are automatically logged for cost monitoring. See `app/services/usage_tracking_service.py`. ```python from app.services.usage_tracking_service import UsageTrackingService tracker = UsageTrackingService() # Get daily usage usage = tracker.get_daily_usage("user_123", date.today()) print(f"Requests: {usage.total_requests}") print(f"Cost: ${usage.estimated_cost:.4f}") # Get monthly cost monthly = tracker.get_monthly_cost("user_123", 2025, 11) ``` --- ## Rate Limiting Tier-based daily limits enforced via `app/services/rate_limiter_service.py`. ### AI Calls (Turns) | Tier | Daily Limit | |------|------------| | FREE | 20 turns | | BASIC | 50 turns | | PREMIUM | 100 turns | | ELITE | 200 turns | ### Custom Actions Free-text player actions (beyond preset buttons) have separate limits: | Tier | Custom Actions/Day | Max Characters | |------|-------------------|----------------| | FREE | 10 | 150 | | BASIC | 50 | 300 | | PREMIUM | Unlimited | 500 | | ELITE | Unlimited | 500 | These are configurable in `config/*.yaml` under `rate_limiting.tiers.{tier}.custom_actions_per_day` and `custom_action_char_limit`. ```python from app.services.rate_limiter_service import RateLimiterService limiter = RateLimiterService() try: limiter.check_rate_limit("user_123", UserTier.PREMIUM) # Process request... limiter.increment_usage("user_123") except RateLimitExceeded as e: # Return error to user pass ``` --- ## Best Practices 1. **Always specify context type** - Helps optimize token usage and temperature 2. **Provide conversation history** - Improves narrative coherence 3. **Handle errors gracefully** - Show user-friendly messages 4. **Monitor costs** - Use usage tracking service 5. **Test with mocks first** - Use mocked clients during development --- ## Verification Scripts - `scripts/verify_ai_models.py` - Test model routing and API connectivity - `scripts/verify_e2e_ai_generation.py` - End-to-end generation flow tests ```bash # Test model routing (no API key needed) python scripts/verify_ai_models.py # Test with real API calls python scripts/verify_ai_models.py --llama --haiku --sonnet # Full E2E test python scripts/verify_e2e_ai_generation.py --real --tier premium ```