Files
Code_of_Conquest/api/docs/AI_INTEGRATION.md
2025-11-24 23:10:55 -06:00

14 KiB
Raw Permalink Blame History

AI Integration Documentation

Overview

Code of Conquest uses AI models for narrative generation through a unified Replicate API integration. This document covers the AI client architecture, model selection, and usage patterns.

Key Components:

  • ReplicateClient - Low-level API client for all AI models
  • ModelSelector - Tier-based model routing and configuration
  • NarrativeGenerator - High-level wrapper for game-specific generation

Architecture

┌─────────────────────┐
│  NarrativeGenerator │  ← High-level game API
├─────────────────────┤
│   ModelSelector     │  ← Tier/context routing
├─────────────────────┤
│   ReplicateClient   │  ← Unified API client
├─────────────────────┤
│   Replicate API     │  ← All models (Llama, Claude)
└─────────────────────┘

All AI models are accessed through Replicate API for unified billing and management.


Replicate Client

File: app/ai/replicate_client.py

Supported Models

Model Type Identifier Tier Use Case
LLAMA_3_8B meta/meta-llama-3-8b-instruct Free Cost-effective, good quality
CLAUDE_HAIKU anthropic/claude-3.5-haiku Basic Fast, high quality
CLAUDE_SONNET anthropic/claude-3.5-sonnet Premium Excellent quality
CLAUDE_SONNET_4 anthropic/claude-4.5-sonnet Elite Best quality

Basic Usage

from app.ai.replicate_client import ReplicateClient, ModelType

# Free tier - Llama (default)
client = ReplicateClient()
response = client.generate(
    prompt="You are a dungeon master...",
    max_tokens=256,
    temperature=0.7
)
print(response.text)
print(f"Tokens: {response.tokens_used}")

# Paid tier - Claude models
client = ReplicateClient(model=ModelType.CLAUDE_HAIKU)
response = client.generate(
    prompt="Describe the tavern",
    system_prompt="You are a dungeon master"
)

# Override model per-call
response = client.generate("Test", model=ModelType.CLAUDE_SONNET)

Response Object

@dataclass
class ReplicateResponse:
    text: str              # Generated text
    tokens_used: int       # Approximate token count
    model: str             # Model identifier
    generation_time: float # Generation time in seconds

Configuration

# Default parameters
DEFAULT_MAX_TOKENS = 256
DEFAULT_TEMPERATURE = 0.7
DEFAULT_TOP_P = 0.9
DEFAULT_TIMEOUT = 30  # seconds

# Model-specific defaults
MODEL_DEFAULTS = {
    ModelType.LLAMA_3_8B: {"max_tokens": 256, "temperature": 0.7},
    ModelType.CLAUDE_HAIKU: {"max_tokens": 512, "temperature": 0.8},
    ModelType.CLAUDE_SONNET: {"max_tokens": 1024, "temperature": 0.9},
    ModelType.CLAUDE_SONNET_4: {"max_tokens": 2048, "temperature": 0.9},
}

Error Handling

from app.ai.replicate_client import (
    ReplicateClientError,    # Base error
    ReplicateAPIError,       # API errors
    ReplicateRateLimitError, # Rate limiting
    ReplicateTimeoutError    # Timeouts
)

try:
    response = client.generate(prompt)
except ReplicateRateLimitError:
    # Handle rate limiting (client retries automatically 3 times)
    pass
except ReplicateTimeoutError:
    # Handle timeout
    pass
except ReplicateAPIError as e:
    # Handle other API errors
    logger.error(f"API error: {e}")

Features

  • Retry Logic: Exponential backoff (3 retries) for rate limits
  • Model-specific Formatting: Llama special tokens, Claude system prompts
  • API Key Validation: client.validate_api_key() method

Model Selector

File: app/ai/model_selector.py

User Tiers

class UserTier(str, Enum):
    FREE = "free"       # Llama 3 8B
    BASIC = "basic"     # Claude Haiku
    PREMIUM = "premium" # Claude Sonnet
    ELITE = "elite"     # Claude Sonnet 4

Context Types

class ContextType(str, Enum):
    STORY_PROGRESSION = "story_progression"  # Creative narratives
    COMBAT_NARRATION = "combat_narration"    # Action descriptions
    QUEST_SELECTION = "quest_selection"      # Quest picking
    NPC_DIALOGUE = "npc_dialogue"            # Character conversations
    SIMPLE_RESPONSE = "simple_response"      # Quick responses

Usage

from app.ai.model_selector import ModelSelector, UserTier, ContextType

selector = ModelSelector()

# Select model configuration
config = selector.select_model(
    user_tier=UserTier.PREMIUM,
    context_type=ContextType.STORY_PROGRESSION
)

print(config.model_type)   # ModelType.CLAUDE_SONNET
print(config.max_tokens)   # 1024
print(config.temperature)  # 0.9

Token Limits by Tier

Tier Base Tokens Model
FREE 256 Llama 3 8B
BASIC 512 Claude Haiku
PREMIUM 1024 Claude Sonnet
ELITE 2048 Claude Sonnet 4

Context Adjustments

Temperature by Context:

  • Story Progression: 0.9 (creative)
  • Combat Narration: 0.8 (exciting)
  • Quest Selection: 0.5 (deterministic)
  • NPC Dialogue: 0.85 (natural)
  • Simple Response: 0.7 (balanced)

Token Multipliers:

  • Story Progression: 1.0× (full allocation)
  • Combat Narration: 0.75× (shorter)
  • Quest Selection: 0.5× (brief)
  • NPC Dialogue: 0.75× (conversational)
  • Simple Response: 0.5× (quick)

Cost Estimation

# Get tier information
info = selector.get_tier_info(UserTier.PREMIUM)
# {
#     "tier": "premium",
#     "model": "anthropic/claude-3.5-sonnet",
#     "model_name": "Claude 3.5 Sonnet",
#     "base_tokens": 1024,
#     "quality": "Excellent quality, detailed narratives"
# }

# Estimate cost per request
cost = selector.estimate_cost_per_request(UserTier.PREMIUM)
# ~$0.009 per request

Narrative Generator

File: app/ai/narrative_generator.py

High-level wrapper that coordinates model selection, prompt templates, and AI generation.

Initialization

from app.ai.narrative_generator import NarrativeGenerator
from app.ai.model_selector import UserTier

generator = NarrativeGenerator()

Story Response Generation

response = generator.generate_story_response(
    character={
        "name": "Aldric",
        "level": 3,
        "player_class": "Fighter",
        "stats": {"strength": 16, "dexterity": 14, ...}
    },
    action="I search the room for hidden doors",
    game_state={
        "current_location": "Ancient Library",
        "location_type": "DUNGEON",
        "active_quests": ["find_artifact"]
    },
    user_tier=UserTier.PREMIUM,
    conversation_history=[
        {"turn": 1, "action": "entered library", "dm_response": "..."},
        {"turn": 2, "action": "examined shelves", "dm_response": "..."}
    ],
    action_instructions="""
        The player searches for supplies. This means:
        - Describe what they FIND, not auto-purchase
        - List items with PRICES if applicable
        - Ask what they want to do with findings
    """  # Optional: from action_prompts.yaml dm_prompt_template
)

print(response.narrative)
print(f"Tokens: {response.tokens_used}")
print(f"Model: {response.model}")
print(f"Time: {response.generation_time:.2f}s")

Action Instructions

The action_instructions parameter passes action-specific guidance from action_prompts.yaml to the AI. This ensures:

  1. Player agency - AI presents options rather than making decisions
  2. Action semantics - "Search" means find options, not auto-buy
  3. Context-aware responses - Different instructions for different actions

The instructions are injected into the prompt template and include critical player agency rules:

  • Never auto-purchase items
  • Never complete transactions without consent
  • Present choices and ask what they want to do

Combat Narration

response = generator.generate_combat_narration(
    character={"name": "Aldric", ...},
    combat_state={
        "round_number": 3,
        "enemies": [{"name": "Goblin", "hp": 5, "max_hp": 10}],
        "terrain": "cave"
    },
    action="swings their sword at the goblin",
    action_result={
        "hit": True,
        "damage": 12,
        "effects": ["bleeding"]
    },
    user_tier=UserTier.BASIC,
    is_critical=True,
    is_finishing_blow=True
)

Quest Selection

quest_id = generator.generate_quest_selection(
    character={"name": "Aldric", "level": 3, ...},
    eligible_quests=[
        {"quest_id": "goblin_cave", "name": "Clear the Cave", ...},
        {"quest_id": "herb_gathering", "name": "Gather Herbs", ...}
    ],
    game_context={
        "current_location": "Tavern",
        "recent_events": ["talked to locals"]
    },
    user_tier=UserTier.FREE,
    recent_actions=["asked about rumors", "ordered ale"]
)
print(quest_id)  # "goblin_cave"

NPC Dialogue

response = generator.generate_npc_dialogue(
    character={"name": "Aldric", ...},
    npc={
        "name": "Old Barkeep",
        "role": "Tavern Owner",
        "personality": "gruff but kind"
    },
    conversation_topic="What rumors have you heard lately?",
    game_state={"current_location": "The Rusty Anchor", ...},
    user_tier=UserTier.PREMIUM,
    npc_knowledge=["goblin attacks", "missing merchant"]
)

Response Object

@dataclass
class NarrativeResponse:
    narrative: str       # Generated text
    tokens_used: int     # Token count
    model: str           # Model used
    context_type: str    # Type of generation
    generation_time: float

Error Handling

from app.ai.narrative_generator import NarrativeGeneratorError

try:
    response = generator.generate_story_response(...)
except NarrativeGeneratorError as e:
    logger.error(f"Generation failed: {e}")
    # Handle gracefully (show error to user, use fallback, etc.)

Prompt Templates

File: app/ai/prompt_templates.py Templates: app/ai/templates/*.j2

Available Templates

  1. story_action.j2 - Story progression turns
  2. combat_action.j2 - Combat narration
  3. quest_offering.j2 - Context-aware quest selection
  4. npc_dialogue.j2 - NPC conversations

Template Filters

  • format_inventory - Format item lists
  • format_stats - Format character stats
  • format_skills - Format skill lists
  • format_effects - Format active effects
  • truncate_text - Truncate with ellipsis
  • format_gold - Format currency

Direct Template Usage

from app.ai.prompt_templates import get_prompt_templates

templates = get_prompt_templates()

prompt = templates.render(
    "story_action.j2",
    character={"name": "Aldric", ...},
    action="search for traps",
    game_state={...},
    conversation_history=[...]
)

Configuration

Environment Variables

# Required
REPLICATE_API_TOKEN=r8_...

# Optional (defaults shown)
REPLICATE_MODEL=meta/meta-llama-3-8b-instruct

Cost Management

Approximate costs per 1K tokens:

Model Input Output
Llama 3 8B Free Free
Claude Haiku $0.001 $0.005
Claude Sonnet $0.003 $0.015
Claude Sonnet 4 $0.015 $0.075

Integration with Background Jobs

AI generation runs asynchronously via RQ jobs. See app/tasks/ai_tasks.py.

from app.tasks.ai_tasks import enqueue_ai_task

# Queue a story action
job = enqueue_ai_task(
    task_type="narrative",
    user_id="user_123",
    context={
        "session_id": "sess_789",
        "character_id": "char_456",
        "action": "I explore the tavern"
    }
)
# Returns: {"job_id": "abc-123", "status": "queued"}

Usage Tracking

All AI calls are automatically logged for cost monitoring. See app/services/usage_tracking_service.py.

from app.services.usage_tracking_service import UsageTrackingService

tracker = UsageTrackingService()

# Get daily usage
usage = tracker.get_daily_usage("user_123", date.today())
print(f"Requests: {usage.total_requests}")
print(f"Cost: ${usage.estimated_cost:.4f}")

# Get monthly cost
monthly = tracker.get_monthly_cost("user_123", 2025, 11)

Rate Limiting

Tier-based daily limits enforced via app/services/rate_limiter_service.py.

AI Calls (Turns)

Tier Daily Limit
FREE 20 turns
BASIC 50 turns
PREMIUM 100 turns
ELITE 200 turns

Custom Actions

Free-text player actions (beyond preset buttons) have separate limits:

Tier Custom Actions/Day Max Characters
FREE 10 150
BASIC 50 300
PREMIUM Unlimited 500
ELITE Unlimited 500

These are configurable in config/*.yaml under rate_limiting.tiers.{tier}.custom_actions_per_day and custom_action_char_limit.

from app.services.rate_limiter_service import RateLimiterService

limiter = RateLimiterService()

try:
    limiter.check_rate_limit("user_123", UserTier.PREMIUM)
    # Process request...
    limiter.increment_usage("user_123")
except RateLimitExceeded as e:
    # Return error to user
    pass

Best Practices

  1. Always specify context type - Helps optimize token usage and temperature
  2. Provide conversation history - Improves narrative coherence
  3. Handle errors gracefully - Show user-friendly messages
  4. Monitor costs - Use usage tracking service
  5. Test with mocks first - Use mocked clients during development

Verification Scripts

  • scripts/verify_ai_models.py - Test model routing and API connectivity
  • scripts/verify_e2e_ai_generation.py - End-to-end generation flow tests
# Test model routing (no API key needed)
python scripts/verify_ai_models.py

# Test with real API calls
python scripts/verify_ai_models.py --llama --haiku --sonnet

# Full E2E test
python scripts/verify_e2e_ai_generation.py --real --tier premium