- Update USAGE_TRACKING.md with new tier limits (50, 200, 1000, unlimited) - Update AI_INTEGRATION.md with new tier limits - Add note that limits are loaded from config (ai_calls_per_day) - Document GET /api/v1/usage endpoint - Update examples to show is_unlimited field - Fix test examples with correct limit values 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
14 KiB
AI Integration Documentation
Overview
Code of Conquest uses AI models for narrative generation through a unified Replicate API integration. This document covers the AI client architecture, model selection, and usage patterns.
Key Components:
- ReplicateClient - Low-level API client for all AI models
- ModelSelector - Tier-based model routing and configuration
- NarrativeGenerator - High-level wrapper for game-specific generation
Architecture
┌─────────────────────┐
│ NarrativeGenerator │ ← High-level game API
├─────────────────────┤
│ ModelSelector │ ← Tier/context routing
├─────────────────────┤
│ ReplicateClient │ ← Unified API client
├─────────────────────┤
│ Replicate API │ ← All models (Llama, Claude)
└─────────────────────┘
All AI models are accessed through Replicate API for unified billing and management.
Replicate Client
File: app/ai/replicate_client.py
Supported Models
| Model Type | Identifier | Tier | Use Case |
|---|---|---|---|
LLAMA_3_8B |
meta/meta-llama-3-8b-instruct |
Free | Cost-effective, good quality |
CLAUDE_HAIKU |
anthropic/claude-3.5-haiku |
Basic | Fast, high quality |
CLAUDE_SONNET |
anthropic/claude-3.5-sonnet |
Premium | Excellent quality |
CLAUDE_SONNET_4 |
anthropic/claude-4.5-sonnet |
Elite | Best quality |
Basic Usage
from app.ai.replicate_client import ReplicateClient, ModelType
# Free tier - Llama (default)
client = ReplicateClient()
response = client.generate(
prompt="You are a dungeon master...",
max_tokens=256,
temperature=0.7
)
print(response.text)
print(f"Tokens: {response.tokens_used}")
# Paid tier - Claude models
client = ReplicateClient(model=ModelType.CLAUDE_HAIKU)
response = client.generate(
prompt="Describe the tavern",
system_prompt="You are a dungeon master"
)
# Override model per-call
response = client.generate("Test", model=ModelType.CLAUDE_SONNET)
Response Object
@dataclass
class ReplicateResponse:
text: str # Generated text
tokens_used: int # Approximate token count
model: str # Model identifier
generation_time: float # Generation time in seconds
Configuration
# Default parameters
DEFAULT_MAX_TOKENS = 256
DEFAULT_TEMPERATURE = 0.7
DEFAULT_TOP_P = 0.9
DEFAULT_TIMEOUT = 30 # seconds
# Model-specific defaults
MODEL_DEFAULTS = {
ModelType.LLAMA_3_8B: {"max_tokens": 256, "temperature": 0.7},
ModelType.CLAUDE_HAIKU: {"max_tokens": 512, "temperature": 0.8},
ModelType.CLAUDE_SONNET: {"max_tokens": 1024, "temperature": 0.9},
ModelType.CLAUDE_SONNET_4: {"max_tokens": 2048, "temperature": 0.9},
}
Error Handling
from app.ai.replicate_client import (
ReplicateClientError, # Base error
ReplicateAPIError, # API errors
ReplicateRateLimitError, # Rate limiting
ReplicateTimeoutError # Timeouts
)
try:
response = client.generate(prompt)
except ReplicateRateLimitError:
# Handle rate limiting (client retries automatically 3 times)
pass
except ReplicateTimeoutError:
# Handle timeout
pass
except ReplicateAPIError as e:
# Handle other API errors
logger.error(f"API error: {e}")
Features
- Retry Logic: Exponential backoff (3 retries) for rate limits
- Model-specific Formatting: Llama special tokens, Claude system prompts
- API Key Validation:
client.validate_api_key()method
Model Selector
File: app/ai/model_selector.py
User Tiers
class UserTier(str, Enum):
FREE = "free" # Llama 3 8B
BASIC = "basic" # Claude Haiku
PREMIUM = "premium" # Claude Sonnet
ELITE = "elite" # Claude Sonnet 4
Context Types
class ContextType(str, Enum):
STORY_PROGRESSION = "story_progression" # Creative narratives
COMBAT_NARRATION = "combat_narration" # Action descriptions
QUEST_SELECTION = "quest_selection" # Quest picking
NPC_DIALOGUE = "npc_dialogue" # Character conversations
SIMPLE_RESPONSE = "simple_response" # Quick responses
Usage
from app.ai.model_selector import ModelSelector, UserTier, ContextType
selector = ModelSelector()
# Select model configuration
config = selector.select_model(
user_tier=UserTier.PREMIUM,
context_type=ContextType.STORY_PROGRESSION
)
print(config.model_type) # ModelType.CLAUDE_SONNET
print(config.max_tokens) # 1024
print(config.temperature) # 0.9
Token Limits by Tier
| Tier | Base Tokens | Model |
|---|---|---|
| FREE | 256 | Llama 3 8B |
| BASIC | 512 | Claude Haiku |
| PREMIUM | 1024 | Claude Sonnet |
| ELITE | 2048 | Claude Sonnet 4 |
Context Adjustments
Temperature by Context:
- Story Progression: 0.9 (creative)
- Combat Narration: 0.8 (exciting)
- Quest Selection: 0.5 (deterministic)
- NPC Dialogue: 0.85 (natural)
- Simple Response: 0.7 (balanced)
Token Multipliers:
- Story Progression: 1.0× (full allocation)
- Combat Narration: 0.75× (shorter)
- Quest Selection: 0.5× (brief)
- NPC Dialogue: 0.75× (conversational)
- Simple Response: 0.5× (quick)
Cost Estimation
# Get tier information
info = selector.get_tier_info(UserTier.PREMIUM)
# {
# "tier": "premium",
# "model": "anthropic/claude-3.5-sonnet",
# "model_name": "Claude 3.5 Sonnet",
# "base_tokens": 1024,
# "quality": "Excellent quality, detailed narratives"
# }
# Estimate cost per request
cost = selector.estimate_cost_per_request(UserTier.PREMIUM)
# ~$0.009 per request
Narrative Generator
File: app/ai/narrative_generator.py
High-level wrapper that coordinates model selection, prompt templates, and AI generation.
Initialization
from app.ai.narrative_generator import NarrativeGenerator
from app.ai.model_selector import UserTier
generator = NarrativeGenerator()
Story Response Generation
response = generator.generate_story_response(
character={
"name": "Aldric",
"level": 3,
"player_class": "Fighter",
"stats": {"strength": 16, "dexterity": 14, ...}
},
action="I search the room for hidden doors",
game_state={
"current_location": "Ancient Library",
"location_type": "DUNGEON",
"active_quests": ["find_artifact"]
},
user_tier=UserTier.PREMIUM,
conversation_history=[
{"turn": 1, "action": "entered library", "dm_response": "..."},
{"turn": 2, "action": "examined shelves", "dm_response": "..."}
],
action_instructions="""
The player searches for supplies. This means:
- Describe what they FIND, not auto-purchase
- List items with PRICES if applicable
- Ask what they want to do with findings
""" # Optional: from action_prompts.yaml dm_prompt_template
)
print(response.narrative)
print(f"Tokens: {response.tokens_used}")
print(f"Model: {response.model}")
print(f"Time: {response.generation_time:.2f}s")
Action Instructions
The action_instructions parameter passes action-specific guidance from action_prompts.yaml to the AI. This ensures:
- Player agency - AI presents options rather than making decisions
- Action semantics - "Search" means find options, not auto-buy
- Context-aware responses - Different instructions for different actions
The instructions are injected into the prompt template and include critical player agency rules:
- Never auto-purchase items
- Never complete transactions without consent
- Present choices and ask what they want to do
Combat Narration
response = generator.generate_combat_narration(
character={"name": "Aldric", ...},
combat_state={
"round_number": 3,
"enemies": [{"name": "Goblin", "hp": 5, "max_hp": 10}],
"terrain": "cave"
},
action="swings their sword at the goblin",
action_result={
"hit": True,
"damage": 12,
"effects": ["bleeding"]
},
user_tier=UserTier.BASIC,
is_critical=True,
is_finishing_blow=True
)
Quest Selection
quest_id = generator.generate_quest_selection(
character={"name": "Aldric", "level": 3, ...},
eligible_quests=[
{"quest_id": "goblin_cave", "name": "Clear the Cave", ...},
{"quest_id": "herb_gathering", "name": "Gather Herbs", ...}
],
game_context={
"current_location": "Tavern",
"recent_events": ["talked to locals"]
},
user_tier=UserTier.FREE,
recent_actions=["asked about rumors", "ordered ale"]
)
print(quest_id) # "goblin_cave"
NPC Dialogue
response = generator.generate_npc_dialogue(
character={"name": "Aldric", ...},
npc={
"name": "Old Barkeep",
"role": "Tavern Owner",
"personality": "gruff but kind"
},
conversation_topic="What rumors have you heard lately?",
game_state={"current_location": "The Rusty Anchor", ...},
user_tier=UserTier.PREMIUM,
npc_knowledge=["goblin attacks", "missing merchant"]
)
Response Object
@dataclass
class NarrativeResponse:
narrative: str # Generated text
tokens_used: int # Token count
model: str # Model used
context_type: str # Type of generation
generation_time: float
Error Handling
from app.ai.narrative_generator import NarrativeGeneratorError
try:
response = generator.generate_story_response(...)
except NarrativeGeneratorError as e:
logger.error(f"Generation failed: {e}")
# Handle gracefully (show error to user, use fallback, etc.)
Prompt Templates
File: app/ai/prompt_templates.py
Templates: app/ai/templates/*.j2
Available Templates
- story_action.j2 - Story progression turns
- combat_action.j2 - Combat narration
- quest_offering.j2 - Context-aware quest selection
- npc_dialogue.j2 - NPC conversations
Template Filters
format_inventory- Format item listsformat_stats- Format character statsformat_skills- Format skill listsformat_effects- Format active effectstruncate_text- Truncate with ellipsisformat_gold- Format currency
Direct Template Usage
from app.ai.prompt_templates import get_prompt_templates
templates = get_prompt_templates()
prompt = templates.render(
"story_action.j2",
character={"name": "Aldric", ...},
action="search for traps",
game_state={...},
conversation_history=[...]
)
Configuration
Environment Variables
# Required
REPLICATE_API_TOKEN=r8_...
# Optional (defaults shown)
REPLICATE_MODEL=meta/meta-llama-3-8b-instruct
Cost Management
Approximate costs per 1K tokens:
| Model | Input | Output |
|---|---|---|
| Llama 3 8B | Free | Free |
| Claude Haiku | $0.001 | $0.005 |
| Claude Sonnet | $0.003 | $0.015 |
| Claude Sonnet 4 | $0.015 | $0.075 |
Integration with Background Jobs
AI generation runs asynchronously via RQ jobs. See app/tasks/ai_tasks.py.
from app.tasks.ai_tasks import enqueue_ai_task
# Queue a story action
job = enqueue_ai_task(
task_type="narrative",
user_id="user_123",
context={
"session_id": "sess_789",
"character_id": "char_456",
"action": "I explore the tavern"
}
)
# Returns: {"job_id": "abc-123", "status": "queued"}
Usage Tracking
All AI calls are automatically logged for cost monitoring. See app/services/usage_tracking_service.py.
from app.services.usage_tracking_service import UsageTrackingService
tracker = UsageTrackingService()
# Get daily usage
usage = tracker.get_daily_usage("user_123", date.today())
print(f"Requests: {usage.total_requests}")
print(f"Cost: ${usage.estimated_cost:.4f}")
# Get monthly cost
monthly = tracker.get_monthly_cost("user_123", 2025, 11)
Rate Limiting
Tier-based daily limits enforced via app/services/rate_limiter_service.py.
Limits are loaded from config (rate_limiting.tiers.{tier}.ai_calls_per_day).
AI Calls (Turns)
| Tier | Daily Limit |
|---|---|
| FREE | 50 turns |
| BASIC | 200 turns |
| PREMIUM | 1000 turns |
| ELITE | Unlimited |
A value of -1 in config means unlimited.
Custom Actions
Free-text player actions (beyond preset buttons) have separate limits:
| Tier | Custom Actions/Day | Max Characters |
|---|---|---|
| FREE | 10 | 150 |
| BASIC | 50 | 300 |
| PREMIUM | Unlimited | 500 |
| ELITE | Unlimited | 500 |
These are configurable in config/*.yaml under rate_limiting.tiers.{tier}.custom_actions_per_day and custom_action_char_limit.
from app.services.rate_limiter_service import RateLimiterService
limiter = RateLimiterService()
try:
limiter.check_rate_limit("user_123", UserTier.PREMIUM)
# Process request...
limiter.increment_usage("user_123")
except RateLimitExceeded as e:
# Return error to user
pass
Best Practices
- Always specify context type - Helps optimize token usage and temperature
- Provide conversation history - Improves narrative coherence
- Handle errors gracefully - Show user-friendly messages
- Monitor costs - Use usage tracking service
- Test with mocks first - Use mocked clients during development
Verification Scripts
scripts/verify_ai_models.py- Test model routing and API connectivityscripts/verify_e2e_ai_generation.py- End-to-end generation flow tests
# Test model routing (no API key needed)
python scripts/verify_ai_models.py
# Test with real API calls
python scripts/verify_ai_models.py --llama --haiku --sonnet
# Full E2E test
python scripts/verify_e2e_ai_generation.py --real --tier premium