Files
Code_of_Conquest/api/docs/USAGE_TRACKING.md
2025-11-24 23:10:55 -06:00

615 lines
15 KiB
Markdown

# Usage Tracking & Cost Controls
## Overview
Code of Conquest implements comprehensive usage tracking and cost controls for AI operations. This ensures sustainable costs, fair usage across tiers, and visibility into system usage patterns.
**Key Components:**
- **UsageTrackingService** - Logs all AI usage and calculates costs
- **RateLimiterService** - Enforces tier-based daily limits
- **AIUsageLog** - Data model for usage events
---
## Architecture
```
┌─────────────────────┐
│ AI Task Jobs │
├─────────────────────┤
│ UsageTrackingService│ ← Logs usage, calculates costs
├─────────────────────┤
│ RateLimiterService │ ← Enforces limits before processing
├─────────────────────┤
│ Redis + Appwrite │ ← Storage layer
└─────────────────────┘
```
---
## Usage Tracking Service
**File:** `app/services/usage_tracking_service.py`
### Initialization
```python
from app.services.usage_tracking_service import UsageTrackingService
tracker = UsageTrackingService()
```
**Required Environment Variables:**
```bash
APPWRITE_ENDPOINT=https://cloud.appwrite.io/v1
APPWRITE_PROJECT_ID=your-project-id
APPWRITE_API_KEY=your-api-key
APPWRITE_DATABASE_ID=main
```
### Logging Usage
```python
from app.models.ai_usage import TaskType
# Log a usage event
usage_log = tracker.log_usage(
user_id="user_123",
model="anthropic/claude-3.5-sonnet",
tokens_input=150,
tokens_output=450,
task_type=TaskType.STORY_PROGRESSION,
session_id="sess_789",
character_id="char_456",
request_duration_ms=2500,
success=True
)
print(f"Log ID: {usage_log.log_id}")
print(f"Cost: ${usage_log.estimated_cost:.6f}")
```
### Querying Usage
**Daily Usage:**
```python
from datetime import date
# Get today's usage
usage = tracker.get_daily_usage("user_123", date.today())
print(f"Requests: {usage.total_requests}")
print(f"Tokens: {usage.total_tokens}")
print(f"Input tokens: {usage.total_input_tokens}")
print(f"Output tokens: {usage.total_output_tokens}")
print(f"Cost: ${usage.estimated_cost:.4f}")
print(f"By task: {usage.requests_by_task}")
# {"story_progression": 10, "combat_narration": 3, ...}
```
**Monthly Cost:**
```python
# Get November 2025 cost
monthly = tracker.get_monthly_cost("user_123", 2025, 11)
print(f"Monthly requests: {monthly.total_requests}")
print(f"Monthly tokens: {monthly.total_tokens}")
print(f"Monthly cost: ${monthly.estimated_cost:.2f}")
```
**Admin Monitoring:**
```python
# Get total platform cost for a day
total_cost = tracker.get_total_daily_cost(date.today())
print(f"Platform daily cost: ${total_cost:.2f}")
# Get user request count for rate limiting
count = tracker.get_user_request_count_today("user_123")
```
### Cost Estimation
**Static Methods (no instance needed):**
```python
from app.services.usage_tracking_service import UsageTrackingService
# Estimate cost for specific request
cost = UsageTrackingService.estimate_cost_for_model(
model="anthropic/claude-3.5-sonnet",
tokens_input=100,
tokens_output=400
)
print(f"Estimated: ${cost:.6f}")
# Get model pricing
info = UsageTrackingService.get_model_cost_info("anthropic/claude-3.5-sonnet")
print(f"Input: ${info['input']}/1K tokens")
print(f"Output: ${info['output']}/1K tokens")
```
---
## Model Pricing
Costs per 1,000 tokens (USD):
| Model | Input | Output | Tier |
|-------|-------|--------|------|
| `meta/meta-llama-3-8b-instruct` | $0.0001 | $0.0001 | Free |
| `meta/meta-llama-3-70b-instruct` | $0.0006 | $0.0006 | - |
| `anthropic/claude-3.5-haiku` | $0.001 | $0.005 | Basic |
| `anthropic/claude-3.5-sonnet` | $0.003 | $0.015 | Premium |
| `anthropic/claude-4.5-sonnet` | $0.003 | $0.015 | Elite |
| `anthropic/claude-3-opus` | $0.015 | $0.075 | - |
**Default cost for unknown models:** $0.001 input, $0.005 output per 1K tokens
---
## Token Estimation
Since the Replicate API doesn't return exact token counts, tokens are estimated based on text length.
### Estimation Formula
```python
# Approximate 4 characters per token
tokens = len(text) // 4
```
### How Tokens Are Calculated
**Input Tokens:**
- Calculated from the full prompt sent to the AI
- Includes: user prompt + system prompt
- Estimated at: `len(prompt + system_prompt) // 4`
**Output Tokens:**
- Calculated from the AI's response text
- Estimated at: `len(response_text) // 4`
### ReplicateResponse Structure
The Replicate client returns both input and output token estimates:
```python
@dataclass
class ReplicateResponse:
text: str
tokens_used: int # Total (input + output)
tokens_input: int # Estimated input tokens
tokens_output: int # Estimated output tokens
model: str
generation_time: float
```
### Example Token Counts
| Content | Characters | Estimated Tokens |
|---------|------------|------------------|
| Short prompt | 400 chars | ~100 tokens |
| Full DM prompt | 4,000 chars | ~1,000 tokens |
| Short response | 200 chars | ~50 tokens |
| Full narrative | 800 chars | ~200 tokens |
### Accuracy Notes
- Estimation is approximate (~75-80% accurate)
- Real tokenization varies by model
- Better to over-estimate for cost budgeting
- Logs use estimates; billing reconciliation may differ
---
## Data Models
**File:** `app/models/ai_usage.py`
### AIUsageLog
```python
@dataclass
class AIUsageLog:
log_id: str # Unique identifier
user_id: str # User who made request
timestamp: datetime # When request was made
model: str # Model identifier
tokens_input: int # Input/prompt tokens
tokens_output: int # Output/response tokens
tokens_total: int # Total tokens
estimated_cost: float # Cost in USD
task_type: TaskType # Type of task
session_id: Optional[str] # Game session
character_id: Optional[str] # Character
request_duration_ms: int # Duration
success: bool # Success status
error_message: Optional[str] # Error if failed
```
### TaskType Enum
```python
class TaskType(str, Enum):
STORY_PROGRESSION = "story_progression"
COMBAT_NARRATION = "combat_narration"
QUEST_SELECTION = "quest_selection"
NPC_DIALOGUE = "npc_dialogue"
GENERAL = "general"
```
### Summary Objects
```python
@dataclass
class DailyUsageSummary:
date: date
user_id: str
total_requests: int
total_tokens: int
total_input_tokens: int
total_output_tokens: int
estimated_cost: float
requests_by_task: Dict[str, int]
@dataclass
class MonthlyUsageSummary:
year: int
month: int
user_id: str
total_requests: int
total_tokens: int
estimated_cost: float
daily_breakdown: list
```
---
## Rate Limiter Service
**File:** `app/services/rate_limiter_service.py`
### Daily Turn Limits
| Tier | Limit | Cost Level |
|------|-------|------------|
| FREE | 20 turns/day | Zero |
| BASIC | 50 turns/day | Low |
| PREMIUM | 100 turns/day | Medium |
| ELITE | 200 turns/day | High |
Counters reset at midnight UTC.
### Custom Action Limits
Free-text actions (beyond preset buttons) have additional limits per tier:
| Tier | Custom Actions/Day | Character Limit |
|------|-------------------|-----------------|
| FREE | 10 | 150 chars |
| BASIC | 50 | 300 chars |
| PREMIUM | Unlimited | 500 chars |
| ELITE | Unlimited | 500 chars |
**Configuration:** These values are defined in `config/*.yaml` under `rate_limiting.tiers`:
```yaml
tiers:
free:
custom_actions_per_day: 10
custom_action_char_limit: 150
```
**Access in code:**
```python
from app.config import get_config
config = get_config()
tier_config = config.rate_limiting.tiers['free']
print(tier_config.custom_actions_per_day) # 10
print(tier_config.custom_action_char_limit) # 150
```
### Basic Usage
```python
from app.services.rate_limiter_service import RateLimiterService, RateLimitExceeded
from app.ai.model_selector import UserTier
limiter = RateLimiterService()
# Check and increment (typical flow)
try:
limiter.check_rate_limit("user_123", UserTier.PREMIUM)
# Process AI request...
limiter.increment_usage("user_123")
except RateLimitExceeded as e:
print(f"Limit reached: {e.current_usage}/{e.limit}")
print(f"Resets at: {e.reset_time}")
```
### Query Methods
```python
# Get current usage
current = limiter.get_current_usage("user_123")
# Get remaining turns
remaining = limiter.get_remaining_turns("user_123", UserTier.PREMIUM)
print(f"Remaining: {remaining} turns")
# Get comprehensive info
info = limiter.get_usage_info("user_123", UserTier.PREMIUM)
# {
# "user_id": "user_123",
# "user_tier": "premium",
# "current_usage": 45,
# "daily_limit": 100,
# "remaining": 55,
# "reset_time": "2025-11-22T00:00:00+00:00",
# "is_limited": False
# }
# Get limit for tier
limit = limiter.get_limit_for_tier(UserTier.ELITE) # 200
```
### Admin Functions
```python
# Reset user's daily counter (testing/admin)
limiter.reset_usage("user_123")
```
### RateLimitExceeded Exception
```python
class RateLimitExceeded(Exception):
user_id: str
user_tier: UserTier
limit: int
current_usage: int
reset_time: datetime
```
Provides all information needed for user-friendly error messages.
---
## Integration Pattern
### In AI Task Jobs
```python
from app.services.rate_limiter_service import RateLimiterService, RateLimitExceeded
from app.services.usage_tracking_service import UsageTrackingService
from app.ai.narrative_generator import NarrativeGenerator
from app.models.ai_usage import TaskType
def process_ai_request(user_id: str, user_tier: UserTier, action: str, ...):
limiter = RateLimiterService()
tracker = UsageTrackingService()
generator = NarrativeGenerator()
# 1. Check rate limit BEFORE processing
try:
limiter.check_rate_limit(user_id, user_tier)
except RateLimitExceeded as e:
return {
"error": "rate_limit_exceeded",
"message": f"Daily limit reached ({e.limit} turns). Resets at {e.reset_time}",
"remaining": 0,
"reset_time": e.reset_time.isoformat()
}
# 2. Generate AI response
start_time = time.time()
response = generator.generate_story_response(...)
duration_ms = int((time.time() - start_time) * 1000)
# 3. Log usage (tokens are estimated in ReplicateClient)
tracker.log_usage(
user_id=user_id,
model=response.model,
tokens_input=response.tokens_input, # From prompt length
tokens_output=response.tokens_output, # From response length
task_type=TaskType.STORY_PROGRESSION,
session_id=session_id,
request_duration_ms=duration_ms,
success=True
)
# 4. Increment rate limit counter
limiter.increment_usage(user_id)
return {"narrative": response.narrative, ...}
```
### API Endpoint Pattern
```python
@bp.route('/sessions/<session_id>/action', methods=['POST'])
@require_auth
def take_action(session_id):
user = get_current_user()
limiter = RateLimiterService()
# Check limit and return remaining info
try:
limiter.check_rate_limit(user.id, user.tier)
except RateLimitExceeded as e:
return api_response(
status=429,
error={
"code": "RATE_LIMIT_EXCEEDED",
"message": "Daily turn limit reached",
"details": {
"limit": e.limit,
"current": e.current_usage,
"reset_time": e.reset_time.isoformat()
}
}
)
# Queue AI job...
remaining = limiter.get_remaining_turns(user.id, user.tier)
return api_response(
status=202,
result={
"job_id": job.id,
"remaining_turns": remaining
}
)
```
---
## Appwrite Collection Schema
**Collection:** `ai_usage_logs`
| Field | Type | Description |
|-------|------|-------------|
| `log_id` | string | Primary key |
| `user_id` | string | User identifier |
| `timestamp` | datetime | Request time (UTC) |
| `model` | string | Model identifier |
| `tokens_input` | integer | Input tokens |
| `tokens_output` | integer | Output tokens |
| `tokens_total` | integer | Total tokens |
| `estimated_cost` | double | Cost in USD |
| `task_type` | string | Task type enum |
| `session_id` | string | Optional session |
| `character_id` | string | Optional character |
| `request_duration_ms` | integer | Duration |
| `success` | boolean | Success status |
| `error_message` | string | Error if failed |
**Indexes:**
- `user_id` + `timestamp` (for daily queries)
- `timestamp` (for admin monitoring)
---
## Cost Management Best Practices
### 1. Pre-request Validation
Always check rate limits before processing:
```python
limiter.check_rate_limit(user_id, user_tier)
```
### 2. Log All Requests
Log both successful and failed requests:
```python
tracker.log_usage(
...,
success=False,
error_message="Model timeout"
)
```
### 3. Monitor Platform Costs
```python
# Daily monitoring
daily_cost = tracker.get_total_daily_cost(date.today())
if daily_cost > 50:
send_alert("WARNING: Daily AI cost exceeded $50")
if daily_cost > 100:
send_alert("CRITICAL: Daily AI cost exceeded $100")
```
### 4. Cost Estimation for UI
Show users estimated costs before actions:
```python
cost_info = UsageTrackingService.get_model_cost_info(model)
estimated = (base_tokens * 1.5 / 1000) * (cost_info['input'] + cost_info['output'])
```
### 5. Tier Upgrade Prompts
When rate limited, prompt upgrades:
```python
if e.user_tier == UserTier.FREE:
message = "Upgrade to Basic for 50 turns/day!"
elif e.user_tier == UserTier.BASIC:
message = "Upgrade to Premium for 100 turns/day!"
```
---
## Target Cost Goals
- **Development:** < $50/day
- **Production target:** < $500/month total
- **Cost per user:** ~$0.10/day (premium tier average)
### Cost Breakdown by Tier (estimated daily)
| Tier | Avg Requests | Avg Cost/Request | Daily Cost |
|------|-------------|-----------------|------------|
| FREE | 10 | $0.00 | $0.00 |
| BASIC | 30 | $0.003 | $0.09 |
| PREMIUM | 60 | $0.01 | $0.60 |
| ELITE | 100 | $0.02 | $2.00 |
---
## Testing
### Unit Tests
```python
# test_usage_tracking_service.py
def test_log_usage():
tracker = UsageTrackingService()
log = tracker.log_usage(
user_id="test_user",
model="meta/meta-llama-3-8b-instruct",
tokens_input=100,
tokens_output=200,
task_type=TaskType.STORY_PROGRESSION
)
assert log.tokens_total == 300
assert log.estimated_cost > 0
# test_rate_limiter_service.py
def test_rate_limit_exceeded():
limiter = RateLimiterService()
# Exceed free tier limit
for _ in range(20):
limiter.increment_usage("test_user")
with pytest.raises(RateLimitExceeded):
limiter.check_rate_limit("test_user", UserTier.FREE)
```
### Integration Testing
```bash
# Check Redis connection
redis-cli ping
# Check Appwrite connection
python -c "from app.services.usage_tracking_service import UsageTrackingService; UsageTrackingService()"
```
---
## Future Enhancements (Deferred)
- **Task 7.15:** Cost monitoring and alerts (daily job, email alerts)
- Billing integration
- Usage quotas per session
- Real-time cost dashboard
- Cost projections