first commit

This commit is contained in:
2025-11-24 23:10:55 -06:00
commit 8315fa51c9
279 changed files with 74600 additions and 0 deletions

538
api/docs/AI_INTEGRATION.md Normal file
View File

@@ -0,0 +1,538 @@
# AI Integration Documentation
## Overview
Code of Conquest uses AI models for narrative generation through a unified Replicate API integration. This document covers the AI client architecture, model selection, and usage patterns.
**Key Components:**
- **ReplicateClient** - Low-level API client for all AI models
- **ModelSelector** - Tier-based model routing and configuration
- **NarrativeGenerator** - High-level wrapper for game-specific generation
---
## Architecture
```
┌─────────────────────┐
│ NarrativeGenerator │ ← High-level game API
├─────────────────────┤
│ ModelSelector │ ← Tier/context routing
├─────────────────────┤
│ ReplicateClient │ ← Unified API client
├─────────────────────┤
│ Replicate API │ ← All models (Llama, Claude)
└─────────────────────┘
```
All AI models are accessed through Replicate API for unified billing and management.
---
## Replicate Client
**File:** `app/ai/replicate_client.py`
### Supported Models
| Model Type | Identifier | Tier | Use Case |
|------------|-----------|------|----------|
| `LLAMA_3_8B` | `meta/meta-llama-3-8b-instruct` | Free | Cost-effective, good quality |
| `CLAUDE_HAIKU` | `anthropic/claude-3.5-haiku` | Basic | Fast, high quality |
| `CLAUDE_SONNET` | `anthropic/claude-3.5-sonnet` | Premium | Excellent quality |
| `CLAUDE_SONNET_4` | `anthropic/claude-4.5-sonnet` | Elite | Best quality |
### Basic Usage
```python
from app.ai.replicate_client import ReplicateClient, ModelType
# Free tier - Llama (default)
client = ReplicateClient()
response = client.generate(
prompt="You are a dungeon master...",
max_tokens=256,
temperature=0.7
)
print(response.text)
print(f"Tokens: {response.tokens_used}")
# Paid tier - Claude models
client = ReplicateClient(model=ModelType.CLAUDE_HAIKU)
response = client.generate(
prompt="Describe the tavern",
system_prompt="You are a dungeon master"
)
# Override model per-call
response = client.generate("Test", model=ModelType.CLAUDE_SONNET)
```
### Response Object
```python
@dataclass
class ReplicateResponse:
text: str # Generated text
tokens_used: int # Approximate token count
model: str # Model identifier
generation_time: float # Generation time in seconds
```
### Configuration
```python
# Default parameters
DEFAULT_MAX_TOKENS = 256
DEFAULT_TEMPERATURE = 0.7
DEFAULT_TOP_P = 0.9
DEFAULT_TIMEOUT = 30 # seconds
# Model-specific defaults
MODEL_DEFAULTS = {
ModelType.LLAMA_3_8B: {"max_tokens": 256, "temperature": 0.7},
ModelType.CLAUDE_HAIKU: {"max_tokens": 512, "temperature": 0.8},
ModelType.CLAUDE_SONNET: {"max_tokens": 1024, "temperature": 0.9},
ModelType.CLAUDE_SONNET_4: {"max_tokens": 2048, "temperature": 0.9},
}
```
### Error Handling
```python
from app.ai.replicate_client import (
ReplicateClientError, # Base error
ReplicateAPIError, # API errors
ReplicateRateLimitError, # Rate limiting
ReplicateTimeoutError # Timeouts
)
try:
response = client.generate(prompt)
except ReplicateRateLimitError:
# Handle rate limiting (client retries automatically 3 times)
pass
except ReplicateTimeoutError:
# Handle timeout
pass
except ReplicateAPIError as e:
# Handle other API errors
logger.error(f"API error: {e}")
```
### Features
- **Retry Logic**: Exponential backoff (3 retries) for rate limits
- **Model-specific Formatting**: Llama special tokens, Claude system prompts
- **API Key Validation**: `client.validate_api_key()` method
---
## Model Selector
**File:** `app/ai/model_selector.py`
### User Tiers
```python
class UserTier(str, Enum):
FREE = "free" # Llama 3 8B
BASIC = "basic" # Claude Haiku
PREMIUM = "premium" # Claude Sonnet
ELITE = "elite" # Claude Sonnet 4
```
### Context Types
```python
class ContextType(str, Enum):
STORY_PROGRESSION = "story_progression" # Creative narratives
COMBAT_NARRATION = "combat_narration" # Action descriptions
QUEST_SELECTION = "quest_selection" # Quest picking
NPC_DIALOGUE = "npc_dialogue" # Character conversations
SIMPLE_RESPONSE = "simple_response" # Quick responses
```
### Usage
```python
from app.ai.model_selector import ModelSelector, UserTier, ContextType
selector = ModelSelector()
# Select model configuration
config = selector.select_model(
user_tier=UserTier.PREMIUM,
context_type=ContextType.STORY_PROGRESSION
)
print(config.model_type) # ModelType.CLAUDE_SONNET
print(config.max_tokens) # 1024
print(config.temperature) # 0.9
```
### Token Limits by Tier
| Tier | Base Tokens | Model |
|------|-------------|-------|
| FREE | 256 | Llama 3 8B |
| BASIC | 512 | Claude Haiku |
| PREMIUM | 1024 | Claude Sonnet |
| ELITE | 2048 | Claude Sonnet 4 |
### Context Adjustments
**Temperature by Context:**
- Story Progression: 0.9 (creative)
- Combat Narration: 0.8 (exciting)
- Quest Selection: 0.5 (deterministic)
- NPC Dialogue: 0.85 (natural)
- Simple Response: 0.7 (balanced)
**Token Multipliers:**
- Story Progression: 1.0× (full allocation)
- Combat Narration: 0.75× (shorter)
- Quest Selection: 0.5× (brief)
- NPC Dialogue: 0.75× (conversational)
- Simple Response: 0.5× (quick)
### Cost Estimation
```python
# Get tier information
info = selector.get_tier_info(UserTier.PREMIUM)
# {
# "tier": "premium",
# "model": "anthropic/claude-3.5-sonnet",
# "model_name": "Claude 3.5 Sonnet",
# "base_tokens": 1024,
# "quality": "Excellent quality, detailed narratives"
# }
# Estimate cost per request
cost = selector.estimate_cost_per_request(UserTier.PREMIUM)
# ~$0.009 per request
```
---
## Narrative Generator
**File:** `app/ai/narrative_generator.py`
High-level wrapper that coordinates model selection, prompt templates, and AI generation.
### Initialization
```python
from app.ai.narrative_generator import NarrativeGenerator
from app.ai.model_selector import UserTier
generator = NarrativeGenerator()
```
### Story Response Generation
```python
response = generator.generate_story_response(
character={
"name": "Aldric",
"level": 3,
"player_class": "Fighter",
"stats": {"strength": 16, "dexterity": 14, ...}
},
action="I search the room for hidden doors",
game_state={
"current_location": "Ancient Library",
"location_type": "DUNGEON",
"active_quests": ["find_artifact"]
},
user_tier=UserTier.PREMIUM,
conversation_history=[
{"turn": 1, "action": "entered library", "dm_response": "..."},
{"turn": 2, "action": "examined shelves", "dm_response": "..."}
],
action_instructions="""
The player searches for supplies. This means:
- Describe what they FIND, not auto-purchase
- List items with PRICES if applicable
- Ask what they want to do with findings
""" # Optional: from action_prompts.yaml dm_prompt_template
)
print(response.narrative)
print(f"Tokens: {response.tokens_used}")
print(f"Model: {response.model}")
print(f"Time: {response.generation_time:.2f}s")
```
### Action Instructions
The `action_instructions` parameter passes action-specific guidance from `action_prompts.yaml` to the AI. This ensures:
1. **Player agency** - AI presents options rather than making decisions
2. **Action semantics** - "Search" means find options, not auto-buy
3. **Context-aware responses** - Different instructions for different actions
The instructions are injected into the prompt template and include critical player agency rules:
- Never auto-purchase items
- Never complete transactions without consent
- Present choices and ask what they want to do
### Combat Narration
```python
response = generator.generate_combat_narration(
character={"name": "Aldric", ...},
combat_state={
"round_number": 3,
"enemies": [{"name": "Goblin", "hp": 5, "max_hp": 10}],
"terrain": "cave"
},
action="swings their sword at the goblin",
action_result={
"hit": True,
"damage": 12,
"effects": ["bleeding"]
},
user_tier=UserTier.BASIC,
is_critical=True,
is_finishing_blow=True
)
```
### Quest Selection
```python
quest_id = generator.generate_quest_selection(
character={"name": "Aldric", "level": 3, ...},
eligible_quests=[
{"quest_id": "goblin_cave", "name": "Clear the Cave", ...},
{"quest_id": "herb_gathering", "name": "Gather Herbs", ...}
],
game_context={
"current_location": "Tavern",
"recent_events": ["talked to locals"]
},
user_tier=UserTier.FREE,
recent_actions=["asked about rumors", "ordered ale"]
)
print(quest_id) # "goblin_cave"
```
### NPC Dialogue
```python
response = generator.generate_npc_dialogue(
character={"name": "Aldric", ...},
npc={
"name": "Old Barkeep",
"role": "Tavern Owner",
"personality": "gruff but kind"
},
conversation_topic="What rumors have you heard lately?",
game_state={"current_location": "The Rusty Anchor", ...},
user_tier=UserTier.PREMIUM,
npc_knowledge=["goblin attacks", "missing merchant"]
)
```
### Response Object
```python
@dataclass
class NarrativeResponse:
narrative: str # Generated text
tokens_used: int # Token count
model: str # Model used
context_type: str # Type of generation
generation_time: float
```
### Error Handling
```python
from app.ai.narrative_generator import NarrativeGeneratorError
try:
response = generator.generate_story_response(...)
except NarrativeGeneratorError as e:
logger.error(f"Generation failed: {e}")
# Handle gracefully (show error to user, use fallback, etc.)
```
---
## Prompt Templates
**File:** `app/ai/prompt_templates.py`
**Templates:** `app/ai/templates/*.j2`
### Available Templates
1. **story_action.j2** - Story progression turns
2. **combat_action.j2** - Combat narration
3. **quest_offering.j2** - Context-aware quest selection
4. **npc_dialogue.j2** - NPC conversations
### Template Filters
- `format_inventory` - Format item lists
- `format_stats` - Format character stats
- `format_skills` - Format skill lists
- `format_effects` - Format active effects
- `truncate_text` - Truncate with ellipsis
- `format_gold` - Format currency
### Direct Template Usage
```python
from app.ai.prompt_templates import get_prompt_templates
templates = get_prompt_templates()
prompt = templates.render(
"story_action.j2",
character={"name": "Aldric", ...},
action="search for traps",
game_state={...},
conversation_history=[...]
)
```
---
## Configuration
### Environment Variables
```bash
# Required
REPLICATE_API_TOKEN=r8_...
# Optional (defaults shown)
REPLICATE_MODEL=meta/meta-llama-3-8b-instruct
```
### Cost Management
Approximate costs per 1K tokens:
| Model | Input | Output |
|-------|-------|--------|
| Llama 3 8B | Free | Free |
| Claude Haiku | $0.001 | $0.005 |
| Claude Sonnet | $0.003 | $0.015 |
| Claude Sonnet 4 | $0.015 | $0.075 |
---
## Integration with Background Jobs
AI generation runs asynchronously via RQ jobs. See `app/tasks/ai_tasks.py`.
```python
from app.tasks.ai_tasks import enqueue_ai_task
# Queue a story action
job = enqueue_ai_task(
task_type="narrative",
user_id="user_123",
context={
"session_id": "sess_789",
"character_id": "char_456",
"action": "I explore the tavern"
}
)
# Returns: {"job_id": "abc-123", "status": "queued"}
```
---
## Usage Tracking
All AI calls are automatically logged for cost monitoring. See `app/services/usage_tracking_service.py`.
```python
from app.services.usage_tracking_service import UsageTrackingService
tracker = UsageTrackingService()
# Get daily usage
usage = tracker.get_daily_usage("user_123", date.today())
print(f"Requests: {usage.total_requests}")
print(f"Cost: ${usage.estimated_cost:.4f}")
# Get monthly cost
monthly = tracker.get_monthly_cost("user_123", 2025, 11)
```
---
## Rate Limiting
Tier-based daily limits enforced via `app/services/rate_limiter_service.py`.
### AI Calls (Turns)
| Tier | Daily Limit |
|------|------------|
| FREE | 20 turns |
| BASIC | 50 turns |
| PREMIUM | 100 turns |
| ELITE | 200 turns |
### Custom Actions
Free-text player actions (beyond preset buttons) have separate limits:
| Tier | Custom Actions/Day | Max Characters |
|------|-------------------|----------------|
| FREE | 10 | 150 |
| BASIC | 50 | 300 |
| PREMIUM | Unlimited | 500 |
| ELITE | Unlimited | 500 |
These are configurable in `config/*.yaml` under `rate_limiting.tiers.{tier}.custom_actions_per_day` and `custom_action_char_limit`.
```python
from app.services.rate_limiter_service import RateLimiterService
limiter = RateLimiterService()
try:
limiter.check_rate_limit("user_123", UserTier.PREMIUM)
# Process request...
limiter.increment_usage("user_123")
except RateLimitExceeded as e:
# Return error to user
pass
```
---
## Best Practices
1. **Always specify context type** - Helps optimize token usage and temperature
2. **Provide conversation history** - Improves narrative coherence
3. **Handle errors gracefully** - Show user-friendly messages
4. **Monitor costs** - Use usage tracking service
5. **Test with mocks first** - Use mocked clients during development
---
## Verification Scripts
- `scripts/verify_ai_models.py` - Test model routing and API connectivity
- `scripts/verify_e2e_ai_generation.py` - End-to-end generation flow tests
```bash
# Test model routing (no API key needed)
python scripts/verify_ai_models.py
# Test with real API calls
python scripts/verify_ai_models.py --llama --haiku --sonnet
# Full E2E test
python scripts/verify_e2e_ai_generation.py --real --tier premium
```