- Update USAGE_TRACKING.md with new tier limits (50, 200, 1000, unlimited) - Update AI_INTEGRATION.md with new tier limits - Add note that limits are loaded from config (ai_calls_per_day) - Document GET /api/v1/usage endpoint - Update examples to show is_unlimited field - Fix test examples with correct limit values 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
543 lines
14 KiB
Markdown
543 lines
14 KiB
Markdown
# AI Integration Documentation
|
||
|
||
## Overview
|
||
|
||
Code of Conquest uses AI models for narrative generation through a unified Replicate API integration. This document covers the AI client architecture, model selection, and usage patterns.
|
||
|
||
**Key Components:**
|
||
- **ReplicateClient** - Low-level API client for all AI models
|
||
- **ModelSelector** - Tier-based model routing and configuration
|
||
- **NarrativeGenerator** - High-level wrapper for game-specific generation
|
||
|
||
---
|
||
|
||
## Architecture
|
||
|
||
```
|
||
┌─────────────────────┐
|
||
│ NarrativeGenerator │ ← High-level game API
|
||
├─────────────────────┤
|
||
│ ModelSelector │ ← Tier/context routing
|
||
├─────────────────────┤
|
||
│ ReplicateClient │ ← Unified API client
|
||
├─────────────────────┤
|
||
│ Replicate API │ ← All models (Llama, Claude)
|
||
└─────────────────────┘
|
||
```
|
||
|
||
All AI models are accessed through Replicate API for unified billing and management.
|
||
|
||
---
|
||
|
||
## Replicate Client
|
||
|
||
**File:** `app/ai/replicate_client.py`
|
||
|
||
### Supported Models
|
||
|
||
| Model Type | Identifier | Tier | Use Case |
|
||
|------------|-----------|------|----------|
|
||
| `LLAMA_3_8B` | `meta/meta-llama-3-8b-instruct` | Free | Cost-effective, good quality |
|
||
| `CLAUDE_HAIKU` | `anthropic/claude-3.5-haiku` | Basic | Fast, high quality |
|
||
| `CLAUDE_SONNET` | `anthropic/claude-3.5-sonnet` | Premium | Excellent quality |
|
||
| `CLAUDE_SONNET_4` | `anthropic/claude-4.5-sonnet` | Elite | Best quality |
|
||
|
||
### Basic Usage
|
||
|
||
```python
|
||
from app.ai.replicate_client import ReplicateClient, ModelType
|
||
|
||
# Free tier - Llama (default)
|
||
client = ReplicateClient()
|
||
response = client.generate(
|
||
prompt="You are a dungeon master...",
|
||
max_tokens=256,
|
||
temperature=0.7
|
||
)
|
||
print(response.text)
|
||
print(f"Tokens: {response.tokens_used}")
|
||
|
||
# Paid tier - Claude models
|
||
client = ReplicateClient(model=ModelType.CLAUDE_HAIKU)
|
||
response = client.generate(
|
||
prompt="Describe the tavern",
|
||
system_prompt="You are a dungeon master"
|
||
)
|
||
|
||
# Override model per-call
|
||
response = client.generate("Test", model=ModelType.CLAUDE_SONNET)
|
||
```
|
||
|
||
### Response Object
|
||
|
||
```python
|
||
@dataclass
|
||
class ReplicateResponse:
|
||
text: str # Generated text
|
||
tokens_used: int # Approximate token count
|
||
model: str # Model identifier
|
||
generation_time: float # Generation time in seconds
|
||
```
|
||
|
||
### Configuration
|
||
|
||
```python
|
||
# Default parameters
|
||
DEFAULT_MAX_TOKENS = 256
|
||
DEFAULT_TEMPERATURE = 0.7
|
||
DEFAULT_TOP_P = 0.9
|
||
DEFAULT_TIMEOUT = 30 # seconds
|
||
|
||
# Model-specific defaults
|
||
MODEL_DEFAULTS = {
|
||
ModelType.LLAMA_3_8B: {"max_tokens": 256, "temperature": 0.7},
|
||
ModelType.CLAUDE_HAIKU: {"max_tokens": 512, "temperature": 0.8},
|
||
ModelType.CLAUDE_SONNET: {"max_tokens": 1024, "temperature": 0.9},
|
||
ModelType.CLAUDE_SONNET_4: {"max_tokens": 2048, "temperature": 0.9},
|
||
}
|
||
```
|
||
|
||
### Error Handling
|
||
|
||
```python
|
||
from app.ai.replicate_client import (
|
||
ReplicateClientError, # Base error
|
||
ReplicateAPIError, # API errors
|
||
ReplicateRateLimitError, # Rate limiting
|
||
ReplicateTimeoutError # Timeouts
|
||
)
|
||
|
||
try:
|
||
response = client.generate(prompt)
|
||
except ReplicateRateLimitError:
|
||
# Handle rate limiting (client retries automatically 3 times)
|
||
pass
|
||
except ReplicateTimeoutError:
|
||
# Handle timeout
|
||
pass
|
||
except ReplicateAPIError as e:
|
||
# Handle other API errors
|
||
logger.error(f"API error: {e}")
|
||
```
|
||
|
||
### Features
|
||
|
||
- **Retry Logic**: Exponential backoff (3 retries) for rate limits
|
||
- **Model-specific Formatting**: Llama special tokens, Claude system prompts
|
||
- **API Key Validation**: `client.validate_api_key()` method
|
||
|
||
---
|
||
|
||
## Model Selector
|
||
|
||
**File:** `app/ai/model_selector.py`
|
||
|
||
### User Tiers
|
||
|
||
```python
|
||
class UserTier(str, Enum):
|
||
FREE = "free" # Llama 3 8B
|
||
BASIC = "basic" # Claude Haiku
|
||
PREMIUM = "premium" # Claude Sonnet
|
||
ELITE = "elite" # Claude Sonnet 4
|
||
```
|
||
|
||
### Context Types
|
||
|
||
```python
|
||
class ContextType(str, Enum):
|
||
STORY_PROGRESSION = "story_progression" # Creative narratives
|
||
COMBAT_NARRATION = "combat_narration" # Action descriptions
|
||
QUEST_SELECTION = "quest_selection" # Quest picking
|
||
NPC_DIALOGUE = "npc_dialogue" # Character conversations
|
||
SIMPLE_RESPONSE = "simple_response" # Quick responses
|
||
```
|
||
|
||
### Usage
|
||
|
||
```python
|
||
from app.ai.model_selector import ModelSelector, UserTier, ContextType
|
||
|
||
selector = ModelSelector()
|
||
|
||
# Select model configuration
|
||
config = selector.select_model(
|
||
user_tier=UserTier.PREMIUM,
|
||
context_type=ContextType.STORY_PROGRESSION
|
||
)
|
||
|
||
print(config.model_type) # ModelType.CLAUDE_SONNET
|
||
print(config.max_tokens) # 1024
|
||
print(config.temperature) # 0.9
|
||
```
|
||
|
||
### Token Limits by Tier
|
||
|
||
| Tier | Base Tokens | Model |
|
||
|------|-------------|-------|
|
||
| FREE | 256 | Llama 3 8B |
|
||
| BASIC | 512 | Claude Haiku |
|
||
| PREMIUM | 1024 | Claude Sonnet |
|
||
| ELITE | 2048 | Claude Sonnet 4 |
|
||
|
||
### Context Adjustments
|
||
|
||
**Temperature by Context:**
|
||
- Story Progression: 0.9 (creative)
|
||
- Combat Narration: 0.8 (exciting)
|
||
- Quest Selection: 0.5 (deterministic)
|
||
- NPC Dialogue: 0.85 (natural)
|
||
- Simple Response: 0.7 (balanced)
|
||
|
||
**Token Multipliers:**
|
||
- Story Progression: 1.0× (full allocation)
|
||
- Combat Narration: 0.75× (shorter)
|
||
- Quest Selection: 0.5× (brief)
|
||
- NPC Dialogue: 0.75× (conversational)
|
||
- Simple Response: 0.5× (quick)
|
||
|
||
### Cost Estimation
|
||
|
||
```python
|
||
# Get tier information
|
||
info = selector.get_tier_info(UserTier.PREMIUM)
|
||
# {
|
||
# "tier": "premium",
|
||
# "model": "anthropic/claude-3.5-sonnet",
|
||
# "model_name": "Claude 3.5 Sonnet",
|
||
# "base_tokens": 1024,
|
||
# "quality": "Excellent quality, detailed narratives"
|
||
# }
|
||
|
||
# Estimate cost per request
|
||
cost = selector.estimate_cost_per_request(UserTier.PREMIUM)
|
||
# ~$0.009 per request
|
||
```
|
||
|
||
---
|
||
|
||
## Narrative Generator
|
||
|
||
**File:** `app/ai/narrative_generator.py`
|
||
|
||
High-level wrapper that coordinates model selection, prompt templates, and AI generation.
|
||
|
||
### Initialization
|
||
|
||
```python
|
||
from app.ai.narrative_generator import NarrativeGenerator
|
||
from app.ai.model_selector import UserTier
|
||
|
||
generator = NarrativeGenerator()
|
||
```
|
||
|
||
### Story Response Generation
|
||
|
||
```python
|
||
response = generator.generate_story_response(
|
||
character={
|
||
"name": "Aldric",
|
||
"level": 3,
|
||
"player_class": "Fighter",
|
||
"stats": {"strength": 16, "dexterity": 14, ...}
|
||
},
|
||
action="I search the room for hidden doors",
|
||
game_state={
|
||
"current_location": "Ancient Library",
|
||
"location_type": "DUNGEON",
|
||
"active_quests": ["find_artifact"]
|
||
},
|
||
user_tier=UserTier.PREMIUM,
|
||
conversation_history=[
|
||
{"turn": 1, "action": "entered library", "dm_response": "..."},
|
||
{"turn": 2, "action": "examined shelves", "dm_response": "..."}
|
||
],
|
||
action_instructions="""
|
||
The player searches for supplies. This means:
|
||
- Describe what they FIND, not auto-purchase
|
||
- List items with PRICES if applicable
|
||
- Ask what they want to do with findings
|
||
""" # Optional: from action_prompts.yaml dm_prompt_template
|
||
)
|
||
|
||
print(response.narrative)
|
||
print(f"Tokens: {response.tokens_used}")
|
||
print(f"Model: {response.model}")
|
||
print(f"Time: {response.generation_time:.2f}s")
|
||
```
|
||
|
||
### Action Instructions
|
||
|
||
The `action_instructions` parameter passes action-specific guidance from `action_prompts.yaml` to the AI. This ensures:
|
||
|
||
1. **Player agency** - AI presents options rather than making decisions
|
||
2. **Action semantics** - "Search" means find options, not auto-buy
|
||
3. **Context-aware responses** - Different instructions for different actions
|
||
|
||
The instructions are injected into the prompt template and include critical player agency rules:
|
||
- Never auto-purchase items
|
||
- Never complete transactions without consent
|
||
- Present choices and ask what they want to do
|
||
|
||
### Combat Narration
|
||
|
||
```python
|
||
response = generator.generate_combat_narration(
|
||
character={"name": "Aldric", ...},
|
||
combat_state={
|
||
"round_number": 3,
|
||
"enemies": [{"name": "Goblin", "hp": 5, "max_hp": 10}],
|
||
"terrain": "cave"
|
||
},
|
||
action="swings their sword at the goblin",
|
||
action_result={
|
||
"hit": True,
|
||
"damage": 12,
|
||
"effects": ["bleeding"]
|
||
},
|
||
user_tier=UserTier.BASIC,
|
||
is_critical=True,
|
||
is_finishing_blow=True
|
||
)
|
||
```
|
||
|
||
### Quest Selection
|
||
|
||
```python
|
||
quest_id = generator.generate_quest_selection(
|
||
character={"name": "Aldric", "level": 3, ...},
|
||
eligible_quests=[
|
||
{"quest_id": "goblin_cave", "name": "Clear the Cave", ...},
|
||
{"quest_id": "herb_gathering", "name": "Gather Herbs", ...}
|
||
],
|
||
game_context={
|
||
"current_location": "Tavern",
|
||
"recent_events": ["talked to locals"]
|
||
},
|
||
user_tier=UserTier.FREE,
|
||
recent_actions=["asked about rumors", "ordered ale"]
|
||
)
|
||
print(quest_id) # "goblin_cave"
|
||
```
|
||
|
||
### NPC Dialogue
|
||
|
||
```python
|
||
response = generator.generate_npc_dialogue(
|
||
character={"name": "Aldric", ...},
|
||
npc={
|
||
"name": "Old Barkeep",
|
||
"role": "Tavern Owner",
|
||
"personality": "gruff but kind"
|
||
},
|
||
conversation_topic="What rumors have you heard lately?",
|
||
game_state={"current_location": "The Rusty Anchor", ...},
|
||
user_tier=UserTier.PREMIUM,
|
||
npc_knowledge=["goblin attacks", "missing merchant"]
|
||
)
|
||
```
|
||
|
||
### Response Object
|
||
|
||
```python
|
||
@dataclass
|
||
class NarrativeResponse:
|
||
narrative: str # Generated text
|
||
tokens_used: int # Token count
|
||
model: str # Model used
|
||
context_type: str # Type of generation
|
||
generation_time: float
|
||
```
|
||
|
||
### Error Handling
|
||
|
||
```python
|
||
from app.ai.narrative_generator import NarrativeGeneratorError
|
||
|
||
try:
|
||
response = generator.generate_story_response(...)
|
||
except NarrativeGeneratorError as e:
|
||
logger.error(f"Generation failed: {e}")
|
||
# Handle gracefully (show error to user, use fallback, etc.)
|
||
```
|
||
|
||
---
|
||
|
||
## Prompt Templates
|
||
|
||
**File:** `app/ai/prompt_templates.py`
|
||
**Templates:** `app/ai/templates/*.j2`
|
||
|
||
### Available Templates
|
||
|
||
1. **story_action.j2** - Story progression turns
|
||
2. **combat_action.j2** - Combat narration
|
||
3. **quest_offering.j2** - Context-aware quest selection
|
||
4. **npc_dialogue.j2** - NPC conversations
|
||
|
||
### Template Filters
|
||
|
||
- `format_inventory` - Format item lists
|
||
- `format_stats` - Format character stats
|
||
- `format_skills` - Format skill lists
|
||
- `format_effects` - Format active effects
|
||
- `truncate_text` - Truncate with ellipsis
|
||
- `format_gold` - Format currency
|
||
|
||
### Direct Template Usage
|
||
|
||
```python
|
||
from app.ai.prompt_templates import get_prompt_templates
|
||
|
||
templates = get_prompt_templates()
|
||
|
||
prompt = templates.render(
|
||
"story_action.j2",
|
||
character={"name": "Aldric", ...},
|
||
action="search for traps",
|
||
game_state={...},
|
||
conversation_history=[...]
|
||
)
|
||
```
|
||
|
||
---
|
||
|
||
## Configuration
|
||
|
||
### Environment Variables
|
||
|
||
```bash
|
||
# Required
|
||
REPLICATE_API_TOKEN=r8_...
|
||
|
||
# Optional (defaults shown)
|
||
REPLICATE_MODEL=meta/meta-llama-3-8b-instruct
|
||
```
|
||
|
||
### Cost Management
|
||
|
||
Approximate costs per 1K tokens:
|
||
|
||
| Model | Input | Output |
|
||
|-------|-------|--------|
|
||
| Llama 3 8B | Free | Free |
|
||
| Claude Haiku | $0.001 | $0.005 |
|
||
| Claude Sonnet | $0.003 | $0.015 |
|
||
| Claude Sonnet 4 | $0.015 | $0.075 |
|
||
|
||
---
|
||
|
||
## Integration with Background Jobs
|
||
|
||
AI generation runs asynchronously via RQ jobs. See `app/tasks/ai_tasks.py`.
|
||
|
||
```python
|
||
from app.tasks.ai_tasks import enqueue_ai_task
|
||
|
||
# Queue a story action
|
||
job = enqueue_ai_task(
|
||
task_type="narrative",
|
||
user_id="user_123",
|
||
context={
|
||
"session_id": "sess_789",
|
||
"character_id": "char_456",
|
||
"action": "I explore the tavern"
|
||
}
|
||
)
|
||
# Returns: {"job_id": "abc-123", "status": "queued"}
|
||
```
|
||
|
||
---
|
||
|
||
## Usage Tracking
|
||
|
||
All AI calls are automatically logged for cost monitoring. See `app/services/usage_tracking_service.py`.
|
||
|
||
```python
|
||
from app.services.usage_tracking_service import UsageTrackingService
|
||
|
||
tracker = UsageTrackingService()
|
||
|
||
# Get daily usage
|
||
usage = tracker.get_daily_usage("user_123", date.today())
|
||
print(f"Requests: {usage.total_requests}")
|
||
print(f"Cost: ${usage.estimated_cost:.4f}")
|
||
|
||
# Get monthly cost
|
||
monthly = tracker.get_monthly_cost("user_123", 2025, 11)
|
||
```
|
||
|
||
---
|
||
|
||
## Rate Limiting
|
||
|
||
Tier-based daily limits enforced via `app/services/rate_limiter_service.py`.
|
||
|
||
Limits are loaded from config (`rate_limiting.tiers.{tier}.ai_calls_per_day`).
|
||
|
||
### AI Calls (Turns)
|
||
|
||
| Tier | Daily Limit |
|
||
|------|------------|
|
||
| FREE | 50 turns |
|
||
| BASIC | 200 turns |
|
||
| PREMIUM | 1000 turns |
|
||
| ELITE | Unlimited |
|
||
|
||
A value of `-1` in config means unlimited.
|
||
|
||
### Custom Actions
|
||
|
||
Free-text player actions (beyond preset buttons) have separate limits:
|
||
|
||
| Tier | Custom Actions/Day | Max Characters |
|
||
|------|-------------------|----------------|
|
||
| FREE | 10 | 150 |
|
||
| BASIC | 50 | 300 |
|
||
| PREMIUM | Unlimited | 500 |
|
||
| ELITE | Unlimited | 500 |
|
||
|
||
These are configurable in `config/*.yaml` under `rate_limiting.tiers.{tier}.custom_actions_per_day` and `custom_action_char_limit`.
|
||
|
||
```python
|
||
from app.services.rate_limiter_service import RateLimiterService
|
||
|
||
limiter = RateLimiterService()
|
||
|
||
try:
|
||
limiter.check_rate_limit("user_123", UserTier.PREMIUM)
|
||
# Process request...
|
||
limiter.increment_usage("user_123")
|
||
except RateLimitExceeded as e:
|
||
# Return error to user
|
||
pass
|
||
```
|
||
|
||
---
|
||
|
||
## Best Practices
|
||
|
||
1. **Always specify context type** - Helps optimize token usage and temperature
|
||
2. **Provide conversation history** - Improves narrative coherence
|
||
3. **Handle errors gracefully** - Show user-friendly messages
|
||
4. **Monitor costs** - Use usage tracking service
|
||
5. **Test with mocks first** - Use mocked clients during development
|
||
|
||
---
|
||
|
||
## Verification Scripts
|
||
|
||
- `scripts/verify_ai_models.py` - Test model routing and API connectivity
|
||
- `scripts/verify_e2e_ai_generation.py` - End-to-end generation flow tests
|
||
|
||
```bash
|
||
# Test model routing (no API key needed)
|
||
python scripts/verify_ai_models.py
|
||
|
||
# Test with real API calls
|
||
python scripts/verify_ai_models.py --llama --haiku --sonnet
|
||
|
||
# Full E2E test
|
||
python scripts/verify_e2e_ai_generation.py --real --tier premium
|
||
```
|