Files

Phillip Tarrant 19808dd44c docs: update rate limit values to match config-based system

- Update USAGE_TRACKING.md with new tier limits (50, 200, 1000, unlimited)
- Update AI_INTEGRATION.md with new tier limits
- Add note that limits are loaded from config (ai_calls_per_day)
- Document GET /api/v1/usage endpoint
- Update examples to show is_unlimited field
- Fix test examples with correct limit values

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-26 10:02:30 -06:00

14 KiB

Raw Blame History

AI Integration Documentation

Overview

Code of Conquest uses AI models for narrative generation through a unified Replicate API integration. This document covers the AI client architecture, model selection, and usage patterns.

Key Components:

ReplicateClient - Low-level API client for all AI models
ModelSelector - Tier-based model routing and configuration
NarrativeGenerator - High-level wrapper for game-specific generation

Architecture

┌─────────────────────┐
│  NarrativeGenerator │  ← High-level game API
├─────────────────────┤
│   ModelSelector     │  ← Tier/context routing
├─────────────────────┤
│   ReplicateClient   │  ← Unified API client
├─────────────────────┤
│   Replicate API     │  ← All models (Llama, Claude)
└─────────────────────┘

All AI models are accessed through Replicate API for unified billing and management.

Replicate Client

File: app/ai/replicate_client.py

Supported Models

Model Type	Identifier	Tier	Use Case
`LLAMA_3_8B`	`meta/meta-llama-3-8b-instruct`	Free	Cost-effective, good quality
`CLAUDE_HAIKU`	`anthropic/claude-3.5-haiku`	Basic	Fast, high quality
`CLAUDE_SONNET`	`anthropic/claude-3.5-sonnet`	Premium	Excellent quality
`CLAUDE_SONNET_4`	`anthropic/claude-4.5-sonnet`	Elite	Best quality

Basic Usage

from app.ai.replicate_client import ReplicateClient, ModelType

# Free tier - Llama (default)
client = ReplicateClient()
response = client.generate(
    prompt="You are a dungeon master...",
    max_tokens=256,
    temperature=0.7
)
print(response.text)
print(f"Tokens: {response.tokens_used}")

# Paid tier - Claude models
client = ReplicateClient(model=ModelType.CLAUDE_HAIKU)
response = client.generate(
    prompt="Describe the tavern",
    system_prompt="You are a dungeon master"
)

# Override model per-call
response = client.generate("Test", model=ModelType.CLAUDE_SONNET)

Response Object

@dataclass
class ReplicateResponse:
    text: str              # Generated text
    tokens_used: int       # Approximate token count
    model: str             # Model identifier
    generation_time: float # Generation time in seconds

Configuration

# Default parameters
DEFAULT_MAX_TOKENS = 256
DEFAULT_TEMPERATURE = 0.7
DEFAULT_TOP_P = 0.9
DEFAULT_TIMEOUT = 30  # seconds

# Model-specific defaults
MODEL_DEFAULTS = {
    ModelType.LLAMA_3_8B: {"max_tokens": 256, "temperature": 0.7},
    ModelType.CLAUDE_HAIKU: {"max_tokens": 512, "temperature": 0.8},
    ModelType.CLAUDE_SONNET: {"max_tokens": 1024, "temperature": 0.9},
    ModelType.CLAUDE_SONNET_4: {"max_tokens": 2048, "temperature": 0.9},
}

Error Handling

from app.ai.replicate_client import (
    ReplicateClientError,    # Base error
    ReplicateAPIError,       # API errors
    ReplicateRateLimitError, # Rate limiting
    ReplicateTimeoutError    # Timeouts
)

try:
    response = client.generate(prompt)
except ReplicateRateLimitError:
    # Handle rate limiting (client retries automatically 3 times)
    pass
except ReplicateTimeoutError:
    # Handle timeout
    pass
except ReplicateAPIError as e:
    # Handle other API errors
    logger.error(f"API error: {e}")

Features

Retry Logic: Exponential backoff (3 retries) for rate limits
Model-specific Formatting: Llama special tokens, Claude system prompts
API Key Validation: client.validate_api_key() method

Model Selector

File: app/ai/model_selector.py

User Tiers

class UserTier(str, Enum):
    FREE = "free"       # Llama 3 8B
    BASIC = "basic"     # Claude Haiku
    PREMIUM = "premium" # Claude Sonnet
    ELITE = "elite"     # Claude Sonnet 4

Context Types

class ContextType(str, Enum):
    STORY_PROGRESSION = "story_progression"  # Creative narratives
    COMBAT_NARRATION = "combat_narration"    # Action descriptions
    QUEST_SELECTION = "quest_selection"      # Quest picking
    NPC_DIALOGUE = "npc_dialogue"            # Character conversations
    SIMPLE_RESPONSE = "simple_response"      # Quick responses

Usage

from app.ai.model_selector import ModelSelector, UserTier, ContextType

selector = ModelSelector()

# Select model configuration
config = selector.select_model(
    user_tier=UserTier.PREMIUM,
    context_type=ContextType.STORY_PROGRESSION
)

print(config.model_type)   # ModelType.CLAUDE_SONNET
print(config.max_tokens)   # 1024
print(config.temperature)  # 0.9

Token Limits by Tier

Tier	Base Tokens	Model
FREE	256	Llama 3 8B
BASIC	512	Claude Haiku
PREMIUM	1024	Claude Sonnet
ELITE	2048	Claude Sonnet 4

Context Adjustments

Temperature by Context:

Story Progression: 0.9 (creative)
Combat Narration: 0.8 (exciting)
Quest Selection: 0.5 (deterministic)
NPC Dialogue: 0.85 (natural)
Simple Response: 0.7 (balanced)

Token Multipliers:

Story Progression: 1.0× (full allocation)
Combat Narration: 0.75× (shorter)
Quest Selection: 0.5× (brief)
NPC Dialogue: 0.75× (conversational)
Simple Response: 0.5× (quick)

Cost Estimation

# Get tier information
info = selector.get_tier_info(UserTier.PREMIUM)
# {
#     "tier": "premium",
#     "model": "anthropic/claude-3.5-sonnet",
#     "model_name": "Claude 3.5 Sonnet",
#     "base_tokens": 1024,
#     "quality": "Excellent quality, detailed narratives"
# }

# Estimate cost per request
cost = selector.estimate_cost_per_request(UserTier.PREMIUM)
# ~$0.009 per request

Narrative Generator

File: app/ai/narrative_generator.py

High-level wrapper that coordinates model selection, prompt templates, and AI generation.

Initialization

from app.ai.narrative_generator import NarrativeGenerator
from app.ai.model_selector import UserTier

generator = NarrativeGenerator()

Story Response Generation

response = generator.generate_story_response(
    character={
        "name": "Aldric",
        "level": 3,
        "player_class": "Fighter",
        "stats": {"strength": 16, "dexterity": 14, ...}
    },
    action="I search the room for hidden doors",
    game_state={
        "current_location": "Ancient Library",
        "location_type": "DUNGEON",
        "active_quests": ["find_artifact"]
    },
    user_tier=UserTier.PREMIUM,
    conversation_history=[
        {"turn": 1, "action": "entered library", "dm_response": "..."},
        {"turn": 2, "action": "examined shelves", "dm_response": "..."}
    ],
    action_instructions="""
        The player searches for supplies. This means:
        - Describe what they FIND, not auto-purchase
        - List items with PRICES if applicable
        - Ask what they want to do with findings
    """  # Optional: from action_prompts.yaml dm_prompt_template
)

print(response.narrative)
print(f"Tokens: {response.tokens_used}")
print(f"Model: {response.model}")
print(f"Time: {response.generation_time:.2f}s")

Action Instructions

The action_instructions parameter passes action-specific guidance from action_prompts.yaml to the AI. This ensures:

Player agency - AI presents options rather than making decisions
Action semantics - "Search" means find options, not auto-buy
Context-aware responses - Different instructions for different actions

The instructions are injected into the prompt template and include critical player agency rules:

Never auto-purchase items
Never complete transactions without consent
Present choices and ask what they want to do

Combat Narration

response = generator.generate_combat_narration(
    character={"name": "Aldric", ...},
    combat_state={
        "round_number": 3,
        "enemies": [{"name": "Goblin", "hp": 5, "max_hp": 10}],
        "terrain": "cave"
    },
    action="swings their sword at the goblin",
    action_result={
        "hit": True,
        "damage": 12,
        "effects": ["bleeding"]
    },
    user_tier=UserTier.BASIC,
    is_critical=True,
    is_finishing_blow=True
)

Quest Selection

quest_id = generator.generate_quest_selection(
    character={"name": "Aldric", "level": 3, ...},
    eligible_quests=[
        {"quest_id": "goblin_cave", "name": "Clear the Cave", ...},
        {"quest_id": "herb_gathering", "name": "Gather Herbs", ...}
    ],
    game_context={
        "current_location": "Tavern",
        "recent_events": ["talked to locals"]
    },
    user_tier=UserTier.FREE,
    recent_actions=["asked about rumors", "ordered ale"]
)
print(quest_id)  # "goblin_cave"

NPC Dialogue

response = generator.generate_npc_dialogue(
    character={"name": "Aldric", ...},
    npc={
        "name": "Old Barkeep",
        "role": "Tavern Owner",
        "personality": "gruff but kind"
    },
    conversation_topic="What rumors have you heard lately?",
    game_state={"current_location": "The Rusty Anchor", ...},
    user_tier=UserTier.PREMIUM,
    npc_knowledge=["goblin attacks", "missing merchant"]
)

Response Object

@dataclass
class NarrativeResponse:
    narrative: str       # Generated text
    tokens_used: int     # Token count
    model: str           # Model used
    context_type: str    # Type of generation
    generation_time: float

Error Handling

from app.ai.narrative_generator import NarrativeGeneratorError

try:
    response = generator.generate_story_response(...)
except NarrativeGeneratorError as e:
    logger.error(f"Generation failed: {e}")
    # Handle gracefully (show error to user, use fallback, etc.)

Prompt Templates

File: app/ai/prompt_templates.py Templates: app/ai/templates/*.j2

Available Templates

story_action.j2 - Story progression turns
combat_action.j2 - Combat narration
quest_offering.j2 - Context-aware quest selection
npc_dialogue.j2 - NPC conversations

Template Filters

format_inventory - Format item lists
format_stats - Format character stats
format_skills - Format skill lists
format_effects - Format active effects
truncate_text - Truncate with ellipsis
format_gold - Format currency

Direct Template Usage

from app.ai.prompt_templates import get_prompt_templates

templates = get_prompt_templates()

prompt = templates.render(
    "story_action.j2",
    character={"name": "Aldric", ...},
    action="search for traps",
    game_state={...},
    conversation_history=[...]
)

Configuration

Environment Variables

# Required
REPLICATE_API_TOKEN=r8_...

# Optional (defaults shown)
REPLICATE_MODEL=meta/meta-llama-3-8b-instruct

Cost Management

Approximate costs per 1K tokens:

Model	Input	Output
Llama 3 8B	Free	Free
Claude Haiku	$0.001	$0.005
Claude Sonnet	$0.003	$0.015
Claude Sonnet 4	$0.015	$0.075

Integration with Background Jobs

AI generation runs asynchronously via RQ jobs. See app/tasks/ai_tasks.py.

from app.tasks.ai_tasks import enqueue_ai_task

# Queue a story action
job = enqueue_ai_task(
    task_type="narrative",
    user_id="user_123",
    context={
        "session_id": "sess_789",
        "character_id": "char_456",
        "action": "I explore the tavern"
    }
)
# Returns: {"job_id": "abc-123", "status": "queued"}

Usage Tracking

All AI calls are automatically logged for cost monitoring. See app/services/usage_tracking_service.py.

from app.services.usage_tracking_service import UsageTrackingService

tracker = UsageTrackingService()

# Get daily usage
usage = tracker.get_daily_usage("user_123", date.today())
print(f"Requests: {usage.total_requests}")
print(f"Cost: ${usage.estimated_cost:.4f}")

# Get monthly cost
monthly = tracker.get_monthly_cost("user_123", 2025, 11)

Rate Limiting

Tier-based daily limits enforced via app/services/rate_limiter_service.py.

Limits are loaded from config (rate_limiting.tiers.{tier}.ai_calls_per_day).

AI Calls (Turns)

Tier	Daily Limit
FREE	50 turns
BASIC	200 turns
PREMIUM	1000 turns
ELITE	Unlimited

A value of -1 in config means unlimited.

Custom Actions

Free-text player actions (beyond preset buttons) have separate limits:

Tier	Custom Actions/Day	Max Characters
FREE	10	150
BASIC	50	300
PREMIUM	Unlimited	500
ELITE	Unlimited	500

These are configurable in config/*.yaml under rate_limiting.tiers.{tier}.custom_actions_per_day and custom_action_char_limit.

from app.services.rate_limiter_service import RateLimiterService

limiter = RateLimiterService()

try:
    limiter.check_rate_limit("user_123", UserTier.PREMIUM)
    # Process request...
    limiter.increment_usage("user_123")
except RateLimitExceeded as e:
    # Return error to user
    pass

Best Practices

Always specify context type - Helps optimize token usage and temperature
Provide conversation history - Improves narrative coherence
Handle errors gracefully - Show user-friendly messages
Monitor costs - Use usage tracking service
Test with mocks first - Use mocked clients during development

Verification Scripts

scripts/verify_ai_models.py - Test model routing and API connectivity
scripts/verify_e2e_ai_generation.py - End-to-end generation flow tests

# Test model routing (no API key needed)
python scripts/verify_ai_models.py

# Test with real API calls
python scripts/verify_ai_models.py --llama --haiku --sonnet

# Full E2E test
python scripts/verify_e2e_ai_generation.py --real --tier premium

14 KiB Raw Blame History Unescape Escape

AI Integration Documentation

Overview

Architecture

Replicate Client

Supported Models

Basic Usage

Response Object

Configuration

Error Handling

Features

Model Selector

User Tiers

Context Types

Usage

Token Limits by Tier

Context Adjustments

Cost Estimation

Narrative Generator

Initialization

Story Response Generation

Action Instructions

Combat Narration

Quest Selection

NPC Dialogue

Response Object

Error Handling

Prompt Templates

Available Templates

Template Filters

Direct Template Usage

Configuration

Environment Variables

Cost Management

Integration with Background Jobs

Usage Tracking

Rate Limiting

AI Calls (Turns)

Custom Actions

Best Practices

Verification Scripts

14 KiB

Raw Blame History