Code_of_Conquest/api/docs/USAGE_TRACKING.md

# Usage Tracking & Cost Controls

## Overview

Code of Conquest implements comprehensive usage tracking and cost controls for AI operations. This ensures sustainable costs, fair usage across tiers, and visibility into system usage patterns.

**Key Components:**
- **UsageTrackingService** - Logs all AI usage and calculates costs
- **RateLimiterService** - Enforces tier-based daily limits
- **AIUsageLog** - Data model for usage events

---

## Architecture

```
┌─────────────────────┐
│   AI Task Jobs      │
├─────────────────────┤
│ UsageTrackingService│  ← Logs usage, calculates costs
├─────────────────────┤
│  RateLimiterService │  ← Enforces limits before processing
├─────────────────────┤
│   Redis + Appwrite  │  ← Storage layer
└─────────────────────┘
```

---

## Usage Tracking Service

**File:** `app/services/usage_tracking_service.py`

### Initialization

```python
from app.services.usage_tracking_service import UsageTrackingService

tracker = UsageTrackingService()
```

**Required Environment Variables:**
```bash
APPWRITE_ENDPOINT=https://cloud.appwrite.io/v1
APPWRITE_PROJECT_ID=your-project-id
APPWRITE_API_KEY=your-api-key
APPWRITE_DATABASE_ID=main
```

### Logging Usage

```python
from app.models.ai_usage import TaskType

# Log a usage event
usage_log = tracker.log_usage(
    user_id="user_123",
    model="anthropic/claude-3.5-sonnet",
    tokens_input=150,
    tokens_output=450,
    task_type=TaskType.STORY_PROGRESSION,
    session_id="sess_789",
    character_id="char_456",
    request_duration_ms=2500,
    success=True
)

print(f"Log ID: {usage_log.log_id}")
print(f"Cost: ${usage_log.estimated_cost:.6f}")
```

### Querying Usage

**Daily Usage:**
```python
from datetime import date

# Get today's usage
usage = tracker.get_daily_usage("user_123", date.today())

print(f"Requests: {usage.total_requests}")
print(f"Tokens: {usage.total_tokens}")
print(f"Input tokens: {usage.total_input_tokens}")
print(f"Output tokens: {usage.total_output_tokens}")
print(f"Cost: ${usage.estimated_cost:.4f}")
print(f"By task: {usage.requests_by_task}")
# {"story_progression": 10, "combat_narration": 3, ...}
```

**Monthly Cost:**
```python
# Get November 2025 cost
monthly = tracker.get_monthly_cost("user_123", 2025, 11)

print(f"Monthly requests: {monthly.total_requests}")
print(f"Monthly tokens: {monthly.total_tokens}")
print(f"Monthly cost: ${monthly.estimated_cost:.2f}")
```

**Admin Monitoring:**
```python
# Get total platform cost for a day
total_cost = tracker.get_total_daily_cost(date.today())
print(f"Platform daily cost: ${total_cost:.2f}")

# Get user request count for rate limiting
count = tracker.get_user_request_count_today("user_123")
```

### Cost Estimation

**Static Methods (no instance needed):**
```python
from app.services.usage_tracking_service import UsageTrackingService

# Estimate cost for specific request
cost = UsageTrackingService.estimate_cost_for_model(
    model="anthropic/claude-3.5-sonnet",
    tokens_input=100,
    tokens_output=400
)
print(f"Estimated: ${cost:.6f}")

# Get model pricing
info = UsageTrackingService.get_model_cost_info("anthropic/claude-3.5-sonnet")
print(f"Input: ${info['input']}/1K tokens")
print(f"Output: ${info['output']}/1K tokens")
```

---

## Model Pricing

Costs per 1,000 tokens (USD):

| Model | Input | Output | Tier |
|-------|-------|--------|------|
| `meta/meta-llama-3-8b-instruct` | $0.0001 | $0.0001 | Free |
| `meta/meta-llama-3-70b-instruct` | $0.0006 | $0.0006 | - |
| `anthropic/claude-3.5-haiku` | $0.001 | $0.005 | Basic |
| `anthropic/claude-3.5-sonnet` | $0.003 | $0.015 | Premium |
| `anthropic/claude-4.5-sonnet` | $0.003 | $0.015 | Elite |
| `anthropic/claude-3-opus` | $0.015 | $0.075 | - |

**Default cost for unknown models:** $0.001 input, $0.005 output per 1K tokens

---

## Token Estimation

Since the Replicate API doesn't return exact token counts, tokens are estimated based on text length.

### Estimation Formula

```python
# Approximate 4 characters per token
tokens = len(text) // 4
```

### How Tokens Are Calculated

**Input Tokens:**
- Calculated from the full prompt sent to the AI
- Includes: user prompt + system prompt
- Estimated at: `len(prompt + system_prompt) // 4`

**Output Tokens:**
- Calculated from the AI's response text
- Estimated at: `len(response_text) // 4`

### ReplicateResponse Structure

The Replicate client returns both input and output token estimates:

```python
@dataclass
class ReplicateResponse:
    text: str
    tokens_used: int      # Total (input + output)
    tokens_input: int     # Estimated input tokens
    tokens_output: int    # Estimated output tokens
    model: str
    generation_time: float
```

### Example Token Counts

| Content | Characters | Estimated Tokens |
|---------|------------|------------------|
| Short prompt | 400 chars | ~100 tokens |
| Full DM prompt | 4,000 chars | ~1,000 tokens |
| Short response | 200 chars | ~50 tokens |
| Full narrative | 800 chars | ~200 tokens |

### Accuracy Notes

- Estimation is approximate (~75-80% accurate)
- Real tokenization varies by model
- Better to over-estimate for cost budgeting
- Logs use estimates; billing reconciliation may differ

---

## Data Models

**File:** `app/models/ai_usage.py`

### AIUsageLog

```python
@dataclass
class AIUsageLog:
    log_id: str                    # Unique identifier
    user_id: str                   # User who made request
    timestamp: datetime            # When request was made
    model: str                     # Model identifier
    tokens_input: int              # Input/prompt tokens
    tokens_output: int             # Output/response tokens
    tokens_total: int              # Total tokens
    estimated_cost: float          # Cost in USD
    task_type: TaskType            # Type of task
    session_id: Optional[str]      # Game session
    character_id: Optional[str]    # Character
    request_duration_ms: int       # Duration
    success: bool                  # Success status
    error_message: Optional[str]   # Error if failed
```

### TaskType Enum

```python
class TaskType(str, Enum):
    STORY_PROGRESSION = "story_progression"
    COMBAT_NARRATION = "combat_narration"
    QUEST_SELECTION = "quest_selection"
    NPC_DIALOGUE = "npc_dialogue"
    GENERAL = "general"
```

### Summary Objects

```python
@dataclass
class DailyUsageSummary:
    date: date
    user_id: str
    total_requests: int
    total_tokens: int
    total_input_tokens: int
    total_output_tokens: int
    estimated_cost: float
    requests_by_task: Dict[str, int]

@dataclass
class MonthlyUsageSummary:
    year: int
    month: int
    user_id: str
    total_requests: int
    total_tokens: int
    estimated_cost: float
    daily_breakdown: list
```

---

## Rate Limiter Service

**File:** `app/services/rate_limiter_service.py`

### Daily Turn Limits

| Tier | Limit | Cost Level |
|------|-------|------------|
| FREE | 20 turns/day | Zero |
| BASIC | 50 turns/day | Low |
| PREMIUM | 100 turns/day | Medium |
| ELITE | 200 turns/day | High |

Counters reset at midnight UTC.

### Custom Action Limits

Free-text actions (beyond preset buttons) have additional limits per tier:

| Tier | Custom Actions/Day | Character Limit |
|------|-------------------|-----------------|
| FREE | 10 | 150 chars |
| BASIC | 50 | 300 chars |
| PREMIUM | Unlimited | 500 chars |
| ELITE | Unlimited | 500 chars |

**Configuration:** These values are defined in `config/*.yaml` under `rate_limiting.tiers`:
```yaml
tiers:
  free:
    custom_actions_per_day: 10
    custom_action_char_limit: 150
```

**Access in code:**
```python
from app.config import get_config

config = get_config()
tier_config = config.rate_limiting.tiers['free']
print(tier_config.custom_actions_per_day)      # 10
print(tier_config.custom_action_char_limit)    # 150
```

### Basic Usage

```python
from app.services.rate_limiter_service import RateLimiterService, RateLimitExceeded
from app.ai.model_selector import UserTier

limiter = RateLimiterService()

# Check and increment (typical flow)
try:
    limiter.check_rate_limit("user_123", UserTier.PREMIUM)
    # Process AI request...
    limiter.increment_usage("user_123")
except RateLimitExceeded as e:
    print(f"Limit reached: {e.current_usage}/{e.limit}")
    print(f"Resets at: {e.reset_time}")
```

### Query Methods

```python
# Get current usage
current = limiter.get_current_usage("user_123")

# Get remaining turns
remaining = limiter.get_remaining_turns("user_123", UserTier.PREMIUM)
print(f"Remaining: {remaining} turns")

# Get comprehensive info
info = limiter.get_usage_info("user_123", UserTier.PREMIUM)
# {
#     "user_id": "user_123",
#     "user_tier": "premium",
#     "current_usage": 45,
#     "daily_limit": 100,
#     "remaining": 55,
#     "reset_time": "2025-11-22T00:00:00+00:00",
#     "is_limited": False
# }

# Get limit for tier
limit = limiter.get_limit_for_tier(UserTier.ELITE)  # 200
```

### Admin Functions

```python
# Reset user's daily counter (testing/admin)
limiter.reset_usage("user_123")
```

### RateLimitExceeded Exception

```python
class RateLimitExceeded(Exception):
    user_id: str
    user_tier: UserTier
    limit: int
    current_usage: int
    reset_time: datetime
```

Provides all information needed for user-friendly error messages.

---

## Integration Pattern

### In AI Task Jobs

```python
from app.services.rate_limiter_service import RateLimiterService, RateLimitExceeded
from app.services.usage_tracking_service import UsageTrackingService
from app.ai.narrative_generator import NarrativeGenerator
from app.models.ai_usage import TaskType

def process_ai_request(user_id: str, user_tier: UserTier, action: str, ...):
    limiter = RateLimiterService()
    tracker = UsageTrackingService()
    generator = NarrativeGenerator()

    # 1. Check rate limit BEFORE processing
    try:
        limiter.check_rate_limit(user_id, user_tier)
    except RateLimitExceeded as e:
        return {
            "error": "rate_limit_exceeded",
            "message": f"Daily limit reached ({e.limit} turns). Resets at {e.reset_time}",
            "remaining": 0,
            "reset_time": e.reset_time.isoformat()
        }

    # 2. Generate AI response
    start_time = time.time()
    response = generator.generate_story_response(...)
    duration_ms = int((time.time() - start_time) * 1000)

    # 3. Log usage (tokens are estimated in ReplicateClient)
    tracker.log_usage(
        user_id=user_id,
        model=response.model,
        tokens_input=response.tokens_input,   # From prompt length
        tokens_output=response.tokens_output, # From response length
        task_type=TaskType.STORY_PROGRESSION,
        session_id=session_id,
        request_duration_ms=duration_ms,
        success=True
    )

    # 4. Increment rate limit counter
    limiter.increment_usage(user_id)

    return {"narrative": response.narrative, ...}
```

### API Endpoint Pattern

```python
@bp.route('/sessions/<session_id>/action', methods=['POST'])
@require_auth
def take_action(session_id):
    user = get_current_user()
    limiter = RateLimiterService()

    # Check limit and return remaining info
    try:
        limiter.check_rate_limit(user.id, user.tier)
    except RateLimitExceeded as e:
        return api_response(
            status=429,
            error={
                "code": "RATE_LIMIT_EXCEEDED",
                "message": "Daily turn limit reached",
                "details": {
                    "limit": e.limit,
                    "current": e.current_usage,
                    "reset_time": e.reset_time.isoformat()
                }
            }
        )

    # Queue AI job...
    remaining = limiter.get_remaining_turns(user.id, user.tier)

    return api_response(
        status=202,
        result={
            "job_id": job.id,
            "remaining_turns": remaining
        }
    )
```

---

## Appwrite Collection Schema

**Collection:** `ai_usage_logs`

| Field | Type | Description |
|-------|------|-------------|
| `log_id` | string | Primary key |
| `user_id` | string | User identifier |
| `timestamp` | datetime | Request time (UTC) |
| `model` | string | Model identifier |
| `tokens_input` | integer | Input tokens |
| `tokens_output` | integer | Output tokens |
| `tokens_total` | integer | Total tokens |
| `estimated_cost` | double | Cost in USD |
| `task_type` | string | Task type enum |
| `session_id` | string | Optional session |
| `character_id` | string | Optional character |
| `request_duration_ms` | integer | Duration |
| `success` | boolean | Success status |
| `error_message` | string | Error if failed |

**Indexes:**
- `user_id` + `timestamp` (for daily queries)
- `timestamp` (for admin monitoring)

---

## Cost Management Best Practices

### 1. Pre-request Validation

Always check rate limits before processing:

```python
limiter.check_rate_limit(user_id, user_tier)
```

### 2. Log All Requests

Log both successful and failed requests:

```python
tracker.log_usage(
    ...,
    success=False,
    error_message="Model timeout"
)
```

### 3. Monitor Platform Costs

```python
# Daily monitoring
daily_cost = tracker.get_total_daily_cost(date.today())

if daily_cost > 50:
    send_alert("WARNING: Daily AI cost exceeded $50")
if daily_cost > 100:
    send_alert("CRITICAL: Daily AI cost exceeded $100")
```

### 4. Cost Estimation for UI

Show users estimated costs before actions:

```python
cost_info = UsageTrackingService.get_model_cost_info(model)
estimated = (base_tokens * 1.5 / 1000) * (cost_info['input'] + cost_info['output'])
```

### 5. Tier Upgrade Prompts

When rate limited, prompt upgrades:

```python
if e.user_tier == UserTier.FREE:
    message = "Upgrade to Basic for 50 turns/day!"
elif e.user_tier == UserTier.BASIC:
    message = "Upgrade to Premium for 100 turns/day!"
```

---

## Target Cost Goals

- **Development:** < $50/day
- **Production target:** < $500/month total
- **Cost per user:** ~$0.10/day (premium tier average)

### Cost Breakdown by Tier (estimated daily)

| Tier | Avg Requests | Avg Cost/Request | Daily Cost |
|------|-------------|-----------------|------------|
| FREE | 10 | $0.00 | $0.00 |
| BASIC | 30 | $0.003 | $0.09 |
| PREMIUM | 60 | $0.01 | $0.60 |
| ELITE | 100 | $0.02 | $2.00 |

---

## Testing

### Unit Tests

```python
# test_usage_tracking_service.py
def test_log_usage():
    tracker = UsageTrackingService()
    log = tracker.log_usage(
        user_id="test_user",
        model="meta/meta-llama-3-8b-instruct",
        tokens_input=100,
        tokens_output=200,
        task_type=TaskType.STORY_PROGRESSION
    )
    assert log.tokens_total == 300
    assert log.estimated_cost > 0

# test_rate_limiter_service.py
def test_rate_limit_exceeded():
    limiter = RateLimiterService()

    # Exceed free tier limit
    for _ in range(20):
        limiter.increment_usage("test_user")

    with pytest.raises(RateLimitExceeded):
        limiter.check_rate_limit("test_user", UserTier.FREE)
```

### Integration Testing

```bash
# Check Redis connection
redis-cli ping

# Check Appwrite connection
python -c "from app.services.usage_tracking_service import UsageTrackingService; UsageTrackingService()"
```

---

## Future Enhancements (Deferred)

- **Task 7.15:** Cost monitoring and alerts (daily job, email alerts)
- Billing integration
- Usage quotas per session
- Real-time cost dashboard
- Cost projections