Files
Code_of_Conquest/api/docs/AI_INTEGRATION.md
Phillip Tarrant 19808dd44c docs: update rate limit values to match config-based system
- Update USAGE_TRACKING.md with new tier limits (50, 200, 1000, unlimited)
- Update AI_INTEGRATION.md with new tier limits
- Add note that limits are loaded from config (ai_calls_per_day)
- Document GET /api/v1/usage endpoint
- Update examples to show is_unlimited field
- Fix test examples with correct limit values

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 10:02:30 -06:00

543 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AI Integration Documentation
## Overview
Code of Conquest uses AI models for narrative generation through a unified Replicate API integration. This document covers the AI client architecture, model selection, and usage patterns.
**Key Components:**
- **ReplicateClient** - Low-level API client for all AI models
- **ModelSelector** - Tier-based model routing and configuration
- **NarrativeGenerator** - High-level wrapper for game-specific generation
---
## Architecture
```
┌─────────────────────┐
│ NarrativeGenerator │ ← High-level game API
├─────────────────────┤
│ ModelSelector │ ← Tier/context routing
├─────────────────────┤
│ ReplicateClient │ ← Unified API client
├─────────────────────┤
│ Replicate API │ ← All models (Llama, Claude)
└─────────────────────┘
```
All AI models are accessed through Replicate API for unified billing and management.
---
## Replicate Client
**File:** `app/ai/replicate_client.py`
### Supported Models
| Model Type | Identifier | Tier | Use Case |
|------------|-----------|------|----------|
| `LLAMA_3_8B` | `meta/meta-llama-3-8b-instruct` | Free | Cost-effective, good quality |
| `CLAUDE_HAIKU` | `anthropic/claude-3.5-haiku` | Basic | Fast, high quality |
| `CLAUDE_SONNET` | `anthropic/claude-3.5-sonnet` | Premium | Excellent quality |
| `CLAUDE_SONNET_4` | `anthropic/claude-4.5-sonnet` | Elite | Best quality |
### Basic Usage
```python
from app.ai.replicate_client import ReplicateClient, ModelType
# Free tier - Llama (default)
client = ReplicateClient()
response = client.generate(
prompt="You are a dungeon master...",
max_tokens=256,
temperature=0.7
)
print(response.text)
print(f"Tokens: {response.tokens_used}")
# Paid tier - Claude models
client = ReplicateClient(model=ModelType.CLAUDE_HAIKU)
response = client.generate(
prompt="Describe the tavern",
system_prompt="You are a dungeon master"
)
# Override model per-call
response = client.generate("Test", model=ModelType.CLAUDE_SONNET)
```
### Response Object
```python
@dataclass
class ReplicateResponse:
text: str # Generated text
tokens_used: int # Approximate token count
model: str # Model identifier
generation_time: float # Generation time in seconds
```
### Configuration
```python
# Default parameters
DEFAULT_MAX_TOKENS = 256
DEFAULT_TEMPERATURE = 0.7
DEFAULT_TOP_P = 0.9
DEFAULT_TIMEOUT = 30 # seconds
# Model-specific defaults
MODEL_DEFAULTS = {
ModelType.LLAMA_3_8B: {"max_tokens": 256, "temperature": 0.7},
ModelType.CLAUDE_HAIKU: {"max_tokens": 512, "temperature": 0.8},
ModelType.CLAUDE_SONNET: {"max_tokens": 1024, "temperature": 0.9},
ModelType.CLAUDE_SONNET_4: {"max_tokens": 2048, "temperature": 0.9},
}
```
### Error Handling
```python
from app.ai.replicate_client import (
ReplicateClientError, # Base error
ReplicateAPIError, # API errors
ReplicateRateLimitError, # Rate limiting
ReplicateTimeoutError # Timeouts
)
try:
response = client.generate(prompt)
except ReplicateRateLimitError:
# Handle rate limiting (client retries automatically 3 times)
pass
except ReplicateTimeoutError:
# Handle timeout
pass
except ReplicateAPIError as e:
# Handle other API errors
logger.error(f"API error: {e}")
```
### Features
- **Retry Logic**: Exponential backoff (3 retries) for rate limits
- **Model-specific Formatting**: Llama special tokens, Claude system prompts
- **API Key Validation**: `client.validate_api_key()` method
---
## Model Selector
**File:** `app/ai/model_selector.py`
### User Tiers
```python
class UserTier(str, Enum):
FREE = "free" # Llama 3 8B
BASIC = "basic" # Claude Haiku
PREMIUM = "premium" # Claude Sonnet
ELITE = "elite" # Claude Sonnet 4
```
### Context Types
```python
class ContextType(str, Enum):
STORY_PROGRESSION = "story_progression" # Creative narratives
COMBAT_NARRATION = "combat_narration" # Action descriptions
QUEST_SELECTION = "quest_selection" # Quest picking
NPC_DIALOGUE = "npc_dialogue" # Character conversations
SIMPLE_RESPONSE = "simple_response" # Quick responses
```
### Usage
```python
from app.ai.model_selector import ModelSelector, UserTier, ContextType
selector = ModelSelector()
# Select model configuration
config = selector.select_model(
user_tier=UserTier.PREMIUM,
context_type=ContextType.STORY_PROGRESSION
)
print(config.model_type) # ModelType.CLAUDE_SONNET
print(config.max_tokens) # 1024
print(config.temperature) # 0.9
```
### Token Limits by Tier
| Tier | Base Tokens | Model |
|------|-------------|-------|
| FREE | 256 | Llama 3 8B |
| BASIC | 512 | Claude Haiku |
| PREMIUM | 1024 | Claude Sonnet |
| ELITE | 2048 | Claude Sonnet 4 |
### Context Adjustments
**Temperature by Context:**
- Story Progression: 0.9 (creative)
- Combat Narration: 0.8 (exciting)
- Quest Selection: 0.5 (deterministic)
- NPC Dialogue: 0.85 (natural)
- Simple Response: 0.7 (balanced)
**Token Multipliers:**
- Story Progression: 1.0× (full allocation)
- Combat Narration: 0.75× (shorter)
- Quest Selection: 0.5× (brief)
- NPC Dialogue: 0.75× (conversational)
- Simple Response: 0.5× (quick)
### Cost Estimation
```python
# Get tier information
info = selector.get_tier_info(UserTier.PREMIUM)
# {
# "tier": "premium",
# "model": "anthropic/claude-3.5-sonnet",
# "model_name": "Claude 3.5 Sonnet",
# "base_tokens": 1024,
# "quality": "Excellent quality, detailed narratives"
# }
# Estimate cost per request
cost = selector.estimate_cost_per_request(UserTier.PREMIUM)
# ~$0.009 per request
```
---
## Narrative Generator
**File:** `app/ai/narrative_generator.py`
High-level wrapper that coordinates model selection, prompt templates, and AI generation.
### Initialization
```python
from app.ai.narrative_generator import NarrativeGenerator
from app.ai.model_selector import UserTier
generator = NarrativeGenerator()
```
### Story Response Generation
```python
response = generator.generate_story_response(
character={
"name": "Aldric",
"level": 3,
"player_class": "Fighter",
"stats": {"strength": 16, "dexterity": 14, ...}
},
action="I search the room for hidden doors",
game_state={
"current_location": "Ancient Library",
"location_type": "DUNGEON",
"active_quests": ["find_artifact"]
},
user_tier=UserTier.PREMIUM,
conversation_history=[
{"turn": 1, "action": "entered library", "dm_response": "..."},
{"turn": 2, "action": "examined shelves", "dm_response": "..."}
],
action_instructions="""
The player searches for supplies. This means:
- Describe what they FIND, not auto-purchase
- List items with PRICES if applicable
- Ask what they want to do with findings
""" # Optional: from action_prompts.yaml dm_prompt_template
)
print(response.narrative)
print(f"Tokens: {response.tokens_used}")
print(f"Model: {response.model}")
print(f"Time: {response.generation_time:.2f}s")
```
### Action Instructions
The `action_instructions` parameter passes action-specific guidance from `action_prompts.yaml` to the AI. This ensures:
1. **Player agency** - AI presents options rather than making decisions
2. **Action semantics** - "Search" means find options, not auto-buy
3. **Context-aware responses** - Different instructions for different actions
The instructions are injected into the prompt template and include critical player agency rules:
- Never auto-purchase items
- Never complete transactions without consent
- Present choices and ask what they want to do
### Combat Narration
```python
response = generator.generate_combat_narration(
character={"name": "Aldric", ...},
combat_state={
"round_number": 3,
"enemies": [{"name": "Goblin", "hp": 5, "max_hp": 10}],
"terrain": "cave"
},
action="swings their sword at the goblin",
action_result={
"hit": True,
"damage": 12,
"effects": ["bleeding"]
},
user_tier=UserTier.BASIC,
is_critical=True,
is_finishing_blow=True
)
```
### Quest Selection
```python
quest_id = generator.generate_quest_selection(
character={"name": "Aldric", "level": 3, ...},
eligible_quests=[
{"quest_id": "goblin_cave", "name": "Clear the Cave", ...},
{"quest_id": "herb_gathering", "name": "Gather Herbs", ...}
],
game_context={
"current_location": "Tavern",
"recent_events": ["talked to locals"]
},
user_tier=UserTier.FREE,
recent_actions=["asked about rumors", "ordered ale"]
)
print(quest_id) # "goblin_cave"
```
### NPC Dialogue
```python
response = generator.generate_npc_dialogue(
character={"name": "Aldric", ...},
npc={
"name": "Old Barkeep",
"role": "Tavern Owner",
"personality": "gruff but kind"
},
conversation_topic="What rumors have you heard lately?",
game_state={"current_location": "The Rusty Anchor", ...},
user_tier=UserTier.PREMIUM,
npc_knowledge=["goblin attacks", "missing merchant"]
)
```
### Response Object
```python
@dataclass
class NarrativeResponse:
narrative: str # Generated text
tokens_used: int # Token count
model: str # Model used
context_type: str # Type of generation
generation_time: float
```
### Error Handling
```python
from app.ai.narrative_generator import NarrativeGeneratorError
try:
response = generator.generate_story_response(...)
except NarrativeGeneratorError as e:
logger.error(f"Generation failed: {e}")
# Handle gracefully (show error to user, use fallback, etc.)
```
---
## Prompt Templates
**File:** `app/ai/prompt_templates.py`
**Templates:** `app/ai/templates/*.j2`
### Available Templates
1. **story_action.j2** - Story progression turns
2. **combat_action.j2** - Combat narration
3. **quest_offering.j2** - Context-aware quest selection
4. **npc_dialogue.j2** - NPC conversations
### Template Filters
- `format_inventory` - Format item lists
- `format_stats` - Format character stats
- `format_skills` - Format skill lists
- `format_effects` - Format active effects
- `truncate_text` - Truncate with ellipsis
- `format_gold` - Format currency
### Direct Template Usage
```python
from app.ai.prompt_templates import get_prompt_templates
templates = get_prompt_templates()
prompt = templates.render(
"story_action.j2",
character={"name": "Aldric", ...},
action="search for traps",
game_state={...},
conversation_history=[...]
)
```
---
## Configuration
### Environment Variables
```bash
# Required
REPLICATE_API_TOKEN=r8_...
# Optional (defaults shown)
REPLICATE_MODEL=meta/meta-llama-3-8b-instruct
```
### Cost Management
Approximate costs per 1K tokens:
| Model | Input | Output |
|-------|-------|--------|
| Llama 3 8B | Free | Free |
| Claude Haiku | $0.001 | $0.005 |
| Claude Sonnet | $0.003 | $0.015 |
| Claude Sonnet 4 | $0.015 | $0.075 |
---
## Integration with Background Jobs
AI generation runs asynchronously via RQ jobs. See `app/tasks/ai_tasks.py`.
```python
from app.tasks.ai_tasks import enqueue_ai_task
# Queue a story action
job = enqueue_ai_task(
task_type="narrative",
user_id="user_123",
context={
"session_id": "sess_789",
"character_id": "char_456",
"action": "I explore the tavern"
}
)
# Returns: {"job_id": "abc-123", "status": "queued"}
```
---
## Usage Tracking
All AI calls are automatically logged for cost monitoring. See `app/services/usage_tracking_service.py`.
```python
from app.services.usage_tracking_service import UsageTrackingService
tracker = UsageTrackingService()
# Get daily usage
usage = tracker.get_daily_usage("user_123", date.today())
print(f"Requests: {usage.total_requests}")
print(f"Cost: ${usage.estimated_cost:.4f}")
# Get monthly cost
monthly = tracker.get_monthly_cost("user_123", 2025, 11)
```
---
## Rate Limiting
Tier-based daily limits enforced via `app/services/rate_limiter_service.py`.
Limits are loaded from config (`rate_limiting.tiers.{tier}.ai_calls_per_day`).
### AI Calls (Turns)
| Tier | Daily Limit |
|------|------------|
| FREE | 50 turns |
| BASIC | 200 turns |
| PREMIUM | 1000 turns |
| ELITE | Unlimited |
A value of `-1` in config means unlimited.
### Custom Actions
Free-text player actions (beyond preset buttons) have separate limits:
| Tier | Custom Actions/Day | Max Characters |
|------|-------------------|----------------|
| FREE | 10 | 150 |
| BASIC | 50 | 300 |
| PREMIUM | Unlimited | 500 |
| ELITE | Unlimited | 500 |
These are configurable in `config/*.yaml` under `rate_limiting.tiers.{tier}.custom_actions_per_day` and `custom_action_char_limit`.
```python
from app.services.rate_limiter_service import RateLimiterService
limiter = RateLimiterService()
try:
limiter.check_rate_limit("user_123", UserTier.PREMIUM)
# Process request...
limiter.increment_usage("user_123")
except RateLimitExceeded as e:
# Return error to user
pass
```
---
## Best Practices
1. **Always specify context type** - Helps optimize token usage and temperature
2. **Provide conversation history** - Improves narrative coherence
3. **Handle errors gracefully** - Show user-friendly messages
4. **Monitor costs** - Use usage tracking service
5. **Test with mocks first** - Use mocked clients during development
---
## Verification Scripts
- `scripts/verify_ai_models.py` - Test model routing and API connectivity
- `scripts/verify_e2e_ai_generation.py` - End-to-end generation flow tests
```bash
# Test model routing (no API key needed)
python scripts/verify_ai_models.py
# Test with real API calls
python scripts/verify_ai_models.py --llama --haiku --sonnet
# Full E2E test
python scripts/verify_e2e_ai_generation.py --real --tier premium
```