first commit
This commit is contained in:
538
api/docs/AI_INTEGRATION.md
Normal file
538
api/docs/AI_INTEGRATION.md
Normal file
@@ -0,0 +1,538 @@
|
||||
# AI Integration Documentation
|
||||
|
||||
## Overview
|
||||
|
||||
Code of Conquest uses AI models for narrative generation through a unified Replicate API integration. This document covers the AI client architecture, model selection, and usage patterns.
|
||||
|
||||
**Key Components:**
|
||||
- **ReplicateClient** - Low-level API client for all AI models
|
||||
- **ModelSelector** - Tier-based model routing and configuration
|
||||
- **NarrativeGenerator** - High-level wrapper for game-specific generation
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────┐
|
||||
│ NarrativeGenerator │ ← High-level game API
|
||||
├─────────────────────┤
|
||||
│ ModelSelector │ ← Tier/context routing
|
||||
├─────────────────────┤
|
||||
│ ReplicateClient │ ← Unified API client
|
||||
├─────────────────────┤
|
||||
│ Replicate API │ ← All models (Llama, Claude)
|
||||
└─────────────────────┘
|
||||
```
|
||||
|
||||
All AI models are accessed through Replicate API for unified billing and management.
|
||||
|
||||
---
|
||||
|
||||
## Replicate Client
|
||||
|
||||
**File:** `app/ai/replicate_client.py`
|
||||
|
||||
### Supported Models
|
||||
|
||||
| Model Type | Identifier | Tier | Use Case |
|
||||
|------------|-----------|------|----------|
|
||||
| `LLAMA_3_8B` | `meta/meta-llama-3-8b-instruct` | Free | Cost-effective, good quality |
|
||||
| `CLAUDE_HAIKU` | `anthropic/claude-3.5-haiku` | Basic | Fast, high quality |
|
||||
| `CLAUDE_SONNET` | `anthropic/claude-3.5-sonnet` | Premium | Excellent quality |
|
||||
| `CLAUDE_SONNET_4` | `anthropic/claude-4.5-sonnet` | Elite | Best quality |
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```python
|
||||
from app.ai.replicate_client import ReplicateClient, ModelType
|
||||
|
||||
# Free tier - Llama (default)
|
||||
client = ReplicateClient()
|
||||
response = client.generate(
|
||||
prompt="You are a dungeon master...",
|
||||
max_tokens=256,
|
||||
temperature=0.7
|
||||
)
|
||||
print(response.text)
|
||||
print(f"Tokens: {response.tokens_used}")
|
||||
|
||||
# Paid tier - Claude models
|
||||
client = ReplicateClient(model=ModelType.CLAUDE_HAIKU)
|
||||
response = client.generate(
|
||||
prompt="Describe the tavern",
|
||||
system_prompt="You are a dungeon master"
|
||||
)
|
||||
|
||||
# Override model per-call
|
||||
response = client.generate("Test", model=ModelType.CLAUDE_SONNET)
|
||||
```
|
||||
|
||||
### Response Object
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ReplicateResponse:
|
||||
text: str # Generated text
|
||||
tokens_used: int # Approximate token count
|
||||
model: str # Model identifier
|
||||
generation_time: float # Generation time in seconds
|
||||
```
|
||||
|
||||
### Configuration
|
||||
|
||||
```python
|
||||
# Default parameters
|
||||
DEFAULT_MAX_TOKENS = 256
|
||||
DEFAULT_TEMPERATURE = 0.7
|
||||
DEFAULT_TOP_P = 0.9
|
||||
DEFAULT_TIMEOUT = 30 # seconds
|
||||
|
||||
# Model-specific defaults
|
||||
MODEL_DEFAULTS = {
|
||||
ModelType.LLAMA_3_8B: {"max_tokens": 256, "temperature": 0.7},
|
||||
ModelType.CLAUDE_HAIKU: {"max_tokens": 512, "temperature": 0.8},
|
||||
ModelType.CLAUDE_SONNET: {"max_tokens": 1024, "temperature": 0.9},
|
||||
ModelType.CLAUDE_SONNET_4: {"max_tokens": 2048, "temperature": 0.9},
|
||||
}
|
||||
```
|
||||
|
||||
### Error Handling
|
||||
|
||||
```python
|
||||
from app.ai.replicate_client import (
|
||||
ReplicateClientError, # Base error
|
||||
ReplicateAPIError, # API errors
|
||||
ReplicateRateLimitError, # Rate limiting
|
||||
ReplicateTimeoutError # Timeouts
|
||||
)
|
||||
|
||||
try:
|
||||
response = client.generate(prompt)
|
||||
except ReplicateRateLimitError:
|
||||
# Handle rate limiting (client retries automatically 3 times)
|
||||
pass
|
||||
except ReplicateTimeoutError:
|
||||
# Handle timeout
|
||||
pass
|
||||
except ReplicateAPIError as e:
|
||||
# Handle other API errors
|
||||
logger.error(f"API error: {e}")
|
||||
```
|
||||
|
||||
### Features
|
||||
|
||||
- **Retry Logic**: Exponential backoff (3 retries) for rate limits
|
||||
- **Model-specific Formatting**: Llama special tokens, Claude system prompts
|
||||
- **API Key Validation**: `client.validate_api_key()` method
|
||||
|
||||
---
|
||||
|
||||
## Model Selector
|
||||
|
||||
**File:** `app/ai/model_selector.py`
|
||||
|
||||
### User Tiers
|
||||
|
||||
```python
|
||||
class UserTier(str, Enum):
|
||||
FREE = "free" # Llama 3 8B
|
||||
BASIC = "basic" # Claude Haiku
|
||||
PREMIUM = "premium" # Claude Sonnet
|
||||
ELITE = "elite" # Claude Sonnet 4
|
||||
```
|
||||
|
||||
### Context Types
|
||||
|
||||
```python
|
||||
class ContextType(str, Enum):
|
||||
STORY_PROGRESSION = "story_progression" # Creative narratives
|
||||
COMBAT_NARRATION = "combat_narration" # Action descriptions
|
||||
QUEST_SELECTION = "quest_selection" # Quest picking
|
||||
NPC_DIALOGUE = "npc_dialogue" # Character conversations
|
||||
SIMPLE_RESPONSE = "simple_response" # Quick responses
|
||||
```
|
||||
|
||||
### Usage
|
||||
|
||||
```python
|
||||
from app.ai.model_selector import ModelSelector, UserTier, ContextType
|
||||
|
||||
selector = ModelSelector()
|
||||
|
||||
# Select model configuration
|
||||
config = selector.select_model(
|
||||
user_tier=UserTier.PREMIUM,
|
||||
context_type=ContextType.STORY_PROGRESSION
|
||||
)
|
||||
|
||||
print(config.model_type) # ModelType.CLAUDE_SONNET
|
||||
print(config.max_tokens) # 1024
|
||||
print(config.temperature) # 0.9
|
||||
```
|
||||
|
||||
### Token Limits by Tier
|
||||
|
||||
| Tier | Base Tokens | Model |
|
||||
|------|-------------|-------|
|
||||
| FREE | 256 | Llama 3 8B |
|
||||
| BASIC | 512 | Claude Haiku |
|
||||
| PREMIUM | 1024 | Claude Sonnet |
|
||||
| ELITE | 2048 | Claude Sonnet 4 |
|
||||
|
||||
### Context Adjustments
|
||||
|
||||
**Temperature by Context:**
|
||||
- Story Progression: 0.9 (creative)
|
||||
- Combat Narration: 0.8 (exciting)
|
||||
- Quest Selection: 0.5 (deterministic)
|
||||
- NPC Dialogue: 0.85 (natural)
|
||||
- Simple Response: 0.7 (balanced)
|
||||
|
||||
**Token Multipliers:**
|
||||
- Story Progression: 1.0× (full allocation)
|
||||
- Combat Narration: 0.75× (shorter)
|
||||
- Quest Selection: 0.5× (brief)
|
||||
- NPC Dialogue: 0.75× (conversational)
|
||||
- Simple Response: 0.5× (quick)
|
||||
|
||||
### Cost Estimation
|
||||
|
||||
```python
|
||||
# Get tier information
|
||||
info = selector.get_tier_info(UserTier.PREMIUM)
|
||||
# {
|
||||
# "tier": "premium",
|
||||
# "model": "anthropic/claude-3.5-sonnet",
|
||||
# "model_name": "Claude 3.5 Sonnet",
|
||||
# "base_tokens": 1024,
|
||||
# "quality": "Excellent quality, detailed narratives"
|
||||
# }
|
||||
|
||||
# Estimate cost per request
|
||||
cost = selector.estimate_cost_per_request(UserTier.PREMIUM)
|
||||
# ~$0.009 per request
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Narrative Generator
|
||||
|
||||
**File:** `app/ai/narrative_generator.py`
|
||||
|
||||
High-level wrapper that coordinates model selection, prompt templates, and AI generation.
|
||||
|
||||
### Initialization
|
||||
|
||||
```python
|
||||
from app.ai.narrative_generator import NarrativeGenerator
|
||||
from app.ai.model_selector import UserTier
|
||||
|
||||
generator = NarrativeGenerator()
|
||||
```
|
||||
|
||||
### Story Response Generation
|
||||
|
||||
```python
|
||||
response = generator.generate_story_response(
|
||||
character={
|
||||
"name": "Aldric",
|
||||
"level": 3,
|
||||
"player_class": "Fighter",
|
||||
"stats": {"strength": 16, "dexterity": 14, ...}
|
||||
},
|
||||
action="I search the room for hidden doors",
|
||||
game_state={
|
||||
"current_location": "Ancient Library",
|
||||
"location_type": "DUNGEON",
|
||||
"active_quests": ["find_artifact"]
|
||||
},
|
||||
user_tier=UserTier.PREMIUM,
|
||||
conversation_history=[
|
||||
{"turn": 1, "action": "entered library", "dm_response": "..."},
|
||||
{"turn": 2, "action": "examined shelves", "dm_response": "..."}
|
||||
],
|
||||
action_instructions="""
|
||||
The player searches for supplies. This means:
|
||||
- Describe what they FIND, not auto-purchase
|
||||
- List items with PRICES if applicable
|
||||
- Ask what they want to do with findings
|
||||
""" # Optional: from action_prompts.yaml dm_prompt_template
|
||||
)
|
||||
|
||||
print(response.narrative)
|
||||
print(f"Tokens: {response.tokens_used}")
|
||||
print(f"Model: {response.model}")
|
||||
print(f"Time: {response.generation_time:.2f}s")
|
||||
```
|
||||
|
||||
### Action Instructions
|
||||
|
||||
The `action_instructions` parameter passes action-specific guidance from `action_prompts.yaml` to the AI. This ensures:
|
||||
|
||||
1. **Player agency** - AI presents options rather than making decisions
|
||||
2. **Action semantics** - "Search" means find options, not auto-buy
|
||||
3. **Context-aware responses** - Different instructions for different actions
|
||||
|
||||
The instructions are injected into the prompt template and include critical player agency rules:
|
||||
- Never auto-purchase items
|
||||
- Never complete transactions without consent
|
||||
- Present choices and ask what they want to do
|
||||
|
||||
### Combat Narration
|
||||
|
||||
```python
|
||||
response = generator.generate_combat_narration(
|
||||
character={"name": "Aldric", ...},
|
||||
combat_state={
|
||||
"round_number": 3,
|
||||
"enemies": [{"name": "Goblin", "hp": 5, "max_hp": 10}],
|
||||
"terrain": "cave"
|
||||
},
|
||||
action="swings their sword at the goblin",
|
||||
action_result={
|
||||
"hit": True,
|
||||
"damage": 12,
|
||||
"effects": ["bleeding"]
|
||||
},
|
||||
user_tier=UserTier.BASIC,
|
||||
is_critical=True,
|
||||
is_finishing_blow=True
|
||||
)
|
||||
```
|
||||
|
||||
### Quest Selection
|
||||
|
||||
```python
|
||||
quest_id = generator.generate_quest_selection(
|
||||
character={"name": "Aldric", "level": 3, ...},
|
||||
eligible_quests=[
|
||||
{"quest_id": "goblin_cave", "name": "Clear the Cave", ...},
|
||||
{"quest_id": "herb_gathering", "name": "Gather Herbs", ...}
|
||||
],
|
||||
game_context={
|
||||
"current_location": "Tavern",
|
||||
"recent_events": ["talked to locals"]
|
||||
},
|
||||
user_tier=UserTier.FREE,
|
||||
recent_actions=["asked about rumors", "ordered ale"]
|
||||
)
|
||||
print(quest_id) # "goblin_cave"
|
||||
```
|
||||
|
||||
### NPC Dialogue
|
||||
|
||||
```python
|
||||
response = generator.generate_npc_dialogue(
|
||||
character={"name": "Aldric", ...},
|
||||
npc={
|
||||
"name": "Old Barkeep",
|
||||
"role": "Tavern Owner",
|
||||
"personality": "gruff but kind"
|
||||
},
|
||||
conversation_topic="What rumors have you heard lately?",
|
||||
game_state={"current_location": "The Rusty Anchor", ...},
|
||||
user_tier=UserTier.PREMIUM,
|
||||
npc_knowledge=["goblin attacks", "missing merchant"]
|
||||
)
|
||||
```
|
||||
|
||||
### Response Object
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class NarrativeResponse:
|
||||
narrative: str # Generated text
|
||||
tokens_used: int # Token count
|
||||
model: str # Model used
|
||||
context_type: str # Type of generation
|
||||
generation_time: float
|
||||
```
|
||||
|
||||
### Error Handling
|
||||
|
||||
```python
|
||||
from app.ai.narrative_generator import NarrativeGeneratorError
|
||||
|
||||
try:
|
||||
response = generator.generate_story_response(...)
|
||||
except NarrativeGeneratorError as e:
|
||||
logger.error(f"Generation failed: {e}")
|
||||
# Handle gracefully (show error to user, use fallback, etc.)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Prompt Templates
|
||||
|
||||
**File:** `app/ai/prompt_templates.py`
|
||||
**Templates:** `app/ai/templates/*.j2`
|
||||
|
||||
### Available Templates
|
||||
|
||||
1. **story_action.j2** - Story progression turns
|
||||
2. **combat_action.j2** - Combat narration
|
||||
3. **quest_offering.j2** - Context-aware quest selection
|
||||
4. **npc_dialogue.j2** - NPC conversations
|
||||
|
||||
### Template Filters
|
||||
|
||||
- `format_inventory` - Format item lists
|
||||
- `format_stats` - Format character stats
|
||||
- `format_skills` - Format skill lists
|
||||
- `format_effects` - Format active effects
|
||||
- `truncate_text` - Truncate with ellipsis
|
||||
- `format_gold` - Format currency
|
||||
|
||||
### Direct Template Usage
|
||||
|
||||
```python
|
||||
from app.ai.prompt_templates import get_prompt_templates
|
||||
|
||||
templates = get_prompt_templates()
|
||||
|
||||
prompt = templates.render(
|
||||
"story_action.j2",
|
||||
character={"name": "Aldric", ...},
|
||||
action="search for traps",
|
||||
game_state={...},
|
||||
conversation_history=[...]
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
# Required
|
||||
REPLICATE_API_TOKEN=r8_...
|
||||
|
||||
# Optional (defaults shown)
|
||||
REPLICATE_MODEL=meta/meta-llama-3-8b-instruct
|
||||
```
|
||||
|
||||
### Cost Management
|
||||
|
||||
Approximate costs per 1K tokens:
|
||||
|
||||
| Model | Input | Output |
|
||||
|-------|-------|--------|
|
||||
| Llama 3 8B | Free | Free |
|
||||
| Claude Haiku | $0.001 | $0.005 |
|
||||
| Claude Sonnet | $0.003 | $0.015 |
|
||||
| Claude Sonnet 4 | $0.015 | $0.075 |
|
||||
|
||||
---
|
||||
|
||||
## Integration with Background Jobs
|
||||
|
||||
AI generation runs asynchronously via RQ jobs. See `app/tasks/ai_tasks.py`.
|
||||
|
||||
```python
|
||||
from app.tasks.ai_tasks import enqueue_ai_task
|
||||
|
||||
# Queue a story action
|
||||
job = enqueue_ai_task(
|
||||
task_type="narrative",
|
||||
user_id="user_123",
|
||||
context={
|
||||
"session_id": "sess_789",
|
||||
"character_id": "char_456",
|
||||
"action": "I explore the tavern"
|
||||
}
|
||||
)
|
||||
# Returns: {"job_id": "abc-123", "status": "queued"}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Usage Tracking
|
||||
|
||||
All AI calls are automatically logged for cost monitoring. See `app/services/usage_tracking_service.py`.
|
||||
|
||||
```python
|
||||
from app.services.usage_tracking_service import UsageTrackingService
|
||||
|
||||
tracker = UsageTrackingService()
|
||||
|
||||
# Get daily usage
|
||||
usage = tracker.get_daily_usage("user_123", date.today())
|
||||
print(f"Requests: {usage.total_requests}")
|
||||
print(f"Cost: ${usage.estimated_cost:.4f}")
|
||||
|
||||
# Get monthly cost
|
||||
monthly = tracker.get_monthly_cost("user_123", 2025, 11)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rate Limiting
|
||||
|
||||
Tier-based daily limits enforced via `app/services/rate_limiter_service.py`.
|
||||
|
||||
### AI Calls (Turns)
|
||||
|
||||
| Tier | Daily Limit |
|
||||
|------|------------|
|
||||
| FREE | 20 turns |
|
||||
| BASIC | 50 turns |
|
||||
| PREMIUM | 100 turns |
|
||||
| ELITE | 200 turns |
|
||||
|
||||
### Custom Actions
|
||||
|
||||
Free-text player actions (beyond preset buttons) have separate limits:
|
||||
|
||||
| Tier | Custom Actions/Day | Max Characters |
|
||||
|------|-------------------|----------------|
|
||||
| FREE | 10 | 150 |
|
||||
| BASIC | 50 | 300 |
|
||||
| PREMIUM | Unlimited | 500 |
|
||||
| ELITE | Unlimited | 500 |
|
||||
|
||||
These are configurable in `config/*.yaml` under `rate_limiting.tiers.{tier}.custom_actions_per_day` and `custom_action_char_limit`.
|
||||
|
||||
```python
|
||||
from app.services.rate_limiter_service import RateLimiterService
|
||||
|
||||
limiter = RateLimiterService()
|
||||
|
||||
try:
|
||||
limiter.check_rate_limit("user_123", UserTier.PREMIUM)
|
||||
# Process request...
|
||||
limiter.increment_usage("user_123")
|
||||
except RateLimitExceeded as e:
|
||||
# Return error to user
|
||||
pass
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Always specify context type** - Helps optimize token usage and temperature
|
||||
2. **Provide conversation history** - Improves narrative coherence
|
||||
3. **Handle errors gracefully** - Show user-friendly messages
|
||||
4. **Monitor costs** - Use usage tracking service
|
||||
5. **Test with mocks first** - Use mocked clients during development
|
||||
|
||||
---
|
||||
|
||||
## Verification Scripts
|
||||
|
||||
- `scripts/verify_ai_models.py` - Test model routing and API connectivity
|
||||
- `scripts/verify_e2e_ai_generation.py` - End-to-end generation flow tests
|
||||
|
||||
```bash
|
||||
# Test model routing (no API key needed)
|
||||
python scripts/verify_ai_models.py
|
||||
|
||||
# Test with real API calls
|
||||
python scripts/verify_ai_models.py --llama --haiku --sonnet
|
||||
|
||||
# Full E2E test
|
||||
python scripts/verify_e2e_ai_generation.py --real --tier premium
|
||||
```
|
||||
Reference in New Issue
Block a user