Files
Code_of_Conquest/api/docs/USAGE_TRACKING.md
Phillip Tarrant 19808dd44c docs: update rate limit values to match config-based system
- Update USAGE_TRACKING.md with new tier limits (50, 200, 1000, unlimited)
- Update AI_INTEGRATION.md with new tier limits
- Add note that limits are loaded from config (ai_calls_per_day)
- Document GET /api/v1/usage endpoint
- Update examples to show is_unlimited field
- Fix test examples with correct limit values

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 10:02:30 -06:00

16 KiB

Usage Tracking & Cost Controls

Overview

Code of Conquest implements comprehensive usage tracking and cost controls for AI operations. This ensures sustainable costs, fair usage across tiers, and visibility into system usage patterns.

Key Components:

  • UsageTrackingService - Logs all AI usage and calculates costs
  • RateLimiterService - Enforces tier-based daily limits
  • AIUsageLog - Data model for usage events

Architecture

┌─────────────────────┐
│   AI Task Jobs      │
├─────────────────────┤
│ UsageTrackingService│  ← Logs usage, calculates costs
├─────────────────────┤
│  RateLimiterService │  ← Enforces limits before processing
├─────────────────────┤
│   Redis + Appwrite  │  ← Storage layer
└─────────────────────┘

Usage Tracking Service

File: app/services/usage_tracking_service.py

Initialization

from app.services.usage_tracking_service import UsageTrackingService

tracker = UsageTrackingService()

Required Environment Variables:

APPWRITE_ENDPOINT=https://cloud.appwrite.io/v1
APPWRITE_PROJECT_ID=your-project-id
APPWRITE_API_KEY=your-api-key
APPWRITE_DATABASE_ID=main

Logging Usage

from app.models.ai_usage import TaskType

# Log a usage event
usage_log = tracker.log_usage(
    user_id="user_123",
    model="anthropic/claude-3.5-sonnet",
    tokens_input=150,
    tokens_output=450,
    task_type=TaskType.STORY_PROGRESSION,
    session_id="sess_789",
    character_id="char_456",
    request_duration_ms=2500,
    success=True
)

print(f"Log ID: {usage_log.log_id}")
print(f"Cost: ${usage_log.estimated_cost:.6f}")

Querying Usage

Daily Usage:

from datetime import date

# Get today's usage
usage = tracker.get_daily_usage("user_123", date.today())

print(f"Requests: {usage.total_requests}")
print(f"Tokens: {usage.total_tokens}")
print(f"Input tokens: {usage.total_input_tokens}")
print(f"Output tokens: {usage.total_output_tokens}")
print(f"Cost: ${usage.estimated_cost:.4f}")
print(f"By task: {usage.requests_by_task}")
# {"story_progression": 10, "combat_narration": 3, ...}

Monthly Cost:

# Get November 2025 cost
monthly = tracker.get_monthly_cost("user_123", 2025, 11)

print(f"Monthly requests: {monthly.total_requests}")
print(f"Monthly tokens: {monthly.total_tokens}")
print(f"Monthly cost: ${monthly.estimated_cost:.2f}")

Admin Monitoring:

# Get total platform cost for a day
total_cost = tracker.get_total_daily_cost(date.today())
print(f"Platform daily cost: ${total_cost:.2f}")

# Get user request count for rate limiting
count = tracker.get_user_request_count_today("user_123")

Cost Estimation

Static Methods (no instance needed):

from app.services.usage_tracking_service import UsageTrackingService

# Estimate cost for specific request
cost = UsageTrackingService.estimate_cost_for_model(
    model="anthropic/claude-3.5-sonnet",
    tokens_input=100,
    tokens_output=400
)
print(f"Estimated: ${cost:.6f}")

# Get model pricing
info = UsageTrackingService.get_model_cost_info("anthropic/claude-3.5-sonnet")
print(f"Input: ${info['input']}/1K tokens")
print(f"Output: ${info['output']}/1K tokens")

Model Pricing

Costs per 1,000 tokens (USD):

Model Input Output Tier
meta/meta-llama-3-8b-instruct $0.0001 $0.0001 Free
meta/meta-llama-3-70b-instruct $0.0006 $0.0006 -
anthropic/claude-3.5-haiku $0.001 $0.005 Basic
anthropic/claude-3.5-sonnet $0.003 $0.015 Premium
anthropic/claude-4.5-sonnet $0.003 $0.015 Elite
anthropic/claude-3-opus $0.015 $0.075 -

Default cost for unknown models: $0.001 input, $0.005 output per 1K tokens


Token Estimation

Since the Replicate API doesn't return exact token counts, tokens are estimated based on text length.

Estimation Formula

# Approximate 4 characters per token
tokens = len(text) // 4

How Tokens Are Calculated

Input Tokens:

  • Calculated from the full prompt sent to the AI
  • Includes: user prompt + system prompt
  • Estimated at: len(prompt + system_prompt) // 4

Output Tokens:

  • Calculated from the AI's response text
  • Estimated at: len(response_text) // 4

ReplicateResponse Structure

The Replicate client returns both input and output token estimates:

@dataclass
class ReplicateResponse:
    text: str
    tokens_used: int      # Total (input + output)
    tokens_input: int     # Estimated input tokens
    tokens_output: int    # Estimated output tokens
    model: str
    generation_time: float

Example Token Counts

Content Characters Estimated Tokens
Short prompt 400 chars ~100 tokens
Full DM prompt 4,000 chars ~1,000 tokens
Short response 200 chars ~50 tokens
Full narrative 800 chars ~200 tokens

Accuracy Notes

  • Estimation is approximate (~75-80% accurate)
  • Real tokenization varies by model
  • Better to over-estimate for cost budgeting
  • Logs use estimates; billing reconciliation may differ

Data Models

File: app/models/ai_usage.py

AIUsageLog

@dataclass
class AIUsageLog:
    log_id: str                    # Unique identifier
    user_id: str                   # User who made request
    timestamp: datetime            # When request was made
    model: str                     # Model identifier
    tokens_input: int              # Input/prompt tokens
    tokens_output: int             # Output/response tokens
    tokens_total: int              # Total tokens
    estimated_cost: float          # Cost in USD
    task_type: TaskType            # Type of task
    session_id: Optional[str]      # Game session
    character_id: Optional[str]    # Character
    request_duration_ms: int       # Duration
    success: bool                  # Success status
    error_message: Optional[str]   # Error if failed

TaskType Enum

class TaskType(str, Enum):
    STORY_PROGRESSION = "story_progression"
    COMBAT_NARRATION = "combat_narration"
    QUEST_SELECTION = "quest_selection"
    NPC_DIALOGUE = "npc_dialogue"
    GENERAL = "general"

Summary Objects

@dataclass
class DailyUsageSummary:
    date: date
    user_id: str
    total_requests: int
    total_tokens: int
    total_input_tokens: int
    total_output_tokens: int
    estimated_cost: float
    requests_by_task: Dict[str, int]

@dataclass
class MonthlyUsageSummary:
    year: int
    month: int
    user_id: str
    total_requests: int
    total_tokens: int
    estimated_cost: float
    daily_breakdown: list

Rate Limiter Service

File: app/services/rate_limiter_service.py

Daily Turn Limits

Limits are loaded from config (rate_limiting.tiers.{tier}.ai_calls_per_day):

Tier Limit Cost Level
FREE 50 turns/day Zero
BASIC 200 turns/day Low
PREMIUM 1000 turns/day Medium
ELITE Unlimited High

Counters reset at midnight UTC. A value of -1 in config means unlimited.

Usage API Endpoint

Get current usage info via GET /api/v1/usage:

{
    "user_id": "user_123",
    "user_tier": "free",
    "current_usage": 15,
    "daily_limit": 50,
    "remaining": 35,
    "reset_time": "2025-11-27T00:00:00+00:00",
    "is_limited": false,
    "is_unlimited": false
}

Custom Action Limits

Free-text actions (beyond preset buttons) have additional limits per tier:

Tier Custom Actions/Day Character Limit
FREE 10 150 chars
BASIC 50 300 chars
PREMIUM Unlimited 500 chars
ELITE Unlimited 500 chars

Configuration: These values are defined in config/*.yaml under rate_limiting.tiers:

tiers:
  free:
    custom_actions_per_day: 10
    custom_action_char_limit: 150

Access in code:

from app.config import get_config

config = get_config()
tier_config = config.rate_limiting.tiers['free']
print(tier_config.custom_actions_per_day)      # 10
print(tier_config.custom_action_char_limit)    # 150

Basic Usage

from app.services.rate_limiter_service import RateLimiterService, RateLimitExceeded
from app.ai.model_selector import UserTier

limiter = RateLimiterService()

# Check and increment (typical flow)
try:
    limiter.check_rate_limit("user_123", UserTier.PREMIUM)
    # Process AI request...
    limiter.increment_usage("user_123")
except RateLimitExceeded as e:
    print(f"Limit reached: {e.current_usage}/{e.limit}")
    print(f"Resets at: {e.reset_time}")

Query Methods

# Get current usage
current = limiter.get_current_usage("user_123")

# Get remaining turns
remaining = limiter.get_remaining_turns("user_123", UserTier.PREMIUM)
print(f"Remaining: {remaining} turns")

# Get comprehensive info
info = limiter.get_usage_info("user_123", UserTier.PREMIUM)
# {
#     "user_id": "user_123",
#     "user_tier": "premium",
#     "current_usage": 45,
#     "daily_limit": 1000,
#     "remaining": 955,
#     "reset_time": "2025-11-22T00:00:00+00:00",
#     "is_limited": False,
#     "is_unlimited": False
# }

# Get limit for tier (-1 means unlimited)
limit = limiter.get_limit_for_tier(UserTier.ELITE)  # -1 (unlimited)

Admin Functions

# Reset user's daily counter (testing/admin)
limiter.reset_usage("user_123")

RateLimitExceeded Exception

class RateLimitExceeded(Exception):
    user_id: str
    user_tier: UserTier
    limit: int
    current_usage: int
    reset_time: datetime

Provides all information needed for user-friendly error messages.


Integration Pattern

In AI Task Jobs

from app.services.rate_limiter_service import RateLimiterService, RateLimitExceeded
from app.services.usage_tracking_service import UsageTrackingService
from app.ai.narrative_generator import NarrativeGenerator
from app.models.ai_usage import TaskType

def process_ai_request(user_id: str, user_tier: UserTier, action: str, ...):
    limiter = RateLimiterService()
    tracker = UsageTrackingService()
    generator = NarrativeGenerator()

    # 1. Check rate limit BEFORE processing
    try:
        limiter.check_rate_limit(user_id, user_tier)
    except RateLimitExceeded as e:
        return {
            "error": "rate_limit_exceeded",
            "message": f"Daily limit reached ({e.limit} turns). Resets at {e.reset_time}",
            "remaining": 0,
            "reset_time": e.reset_time.isoformat()
        }

    # 2. Generate AI response
    start_time = time.time()
    response = generator.generate_story_response(...)
    duration_ms = int((time.time() - start_time) * 1000)

    # 3. Log usage (tokens are estimated in ReplicateClient)
    tracker.log_usage(
        user_id=user_id,
        model=response.model,
        tokens_input=response.tokens_input,   # From prompt length
        tokens_output=response.tokens_output, # From response length
        task_type=TaskType.STORY_PROGRESSION,
        session_id=session_id,
        request_duration_ms=duration_ms,
        success=True
    )

    # 4. Increment rate limit counter
    limiter.increment_usage(user_id)

    return {"narrative": response.narrative, ...}

API Endpoint Pattern

@bp.route('/sessions/<session_id>/action', methods=['POST'])
@require_auth
def take_action(session_id):
    user = get_current_user()
    limiter = RateLimiterService()

    # Check limit and return remaining info
    try:
        limiter.check_rate_limit(user.id, user.tier)
    except RateLimitExceeded as e:
        return api_response(
            status=429,
            error={
                "code": "RATE_LIMIT_EXCEEDED",
                "message": "Daily turn limit reached",
                "details": {
                    "limit": e.limit,
                    "current": e.current_usage,
                    "reset_time": e.reset_time.isoformat()
                }
            }
        )

    # Queue AI job...
    remaining = limiter.get_remaining_turns(user.id, user.tier)

    return api_response(
        status=202,
        result={
            "job_id": job.id,
            "remaining_turns": remaining
        }
    )

Appwrite Collection Schema

Collection: ai_usage_logs

Field Type Description
log_id string Primary key
user_id string User identifier
timestamp datetime Request time (UTC)
model string Model identifier
tokens_input integer Input tokens
tokens_output integer Output tokens
tokens_total integer Total tokens
estimated_cost double Cost in USD
task_type string Task type enum
session_id string Optional session
character_id string Optional character
request_duration_ms integer Duration
success boolean Success status
error_message string Error if failed

Indexes:

  • user_id + timestamp (for daily queries)
  • timestamp (for admin monitoring)

Cost Management Best Practices

1. Pre-request Validation

Always check rate limits before processing:

limiter.check_rate_limit(user_id, user_tier)

2. Log All Requests

Log both successful and failed requests:

tracker.log_usage(
    ...,
    success=False,
    error_message="Model timeout"
)

3. Monitor Platform Costs

# Daily monitoring
daily_cost = tracker.get_total_daily_cost(date.today())

if daily_cost > 50:
    send_alert("WARNING: Daily AI cost exceeded $50")
if daily_cost > 100:
    send_alert("CRITICAL: Daily AI cost exceeded $100")

4. Cost Estimation for UI

Show users estimated costs before actions:

cost_info = UsageTrackingService.get_model_cost_info(model)
estimated = (base_tokens * 1.5 / 1000) * (cost_info['input'] + cost_info['output'])

5. Tier Upgrade Prompts

When rate limited, prompt upgrades:

if e.user_tier == UserTier.FREE:
    message = "Upgrade to Basic for 200 turns/day!"
elif e.user_tier == UserTier.BASIC:
    message = "Upgrade to Premium for 1000 turns/day!"
elif e.user_tier == UserTier.PREMIUM:
    message = "Upgrade to Elite for unlimited turns!"

Target Cost Goals

  • Development: < $50/day
  • Production target: < $500/month total
  • Cost per user: ~$0.10/day (premium tier average)

Cost Breakdown by Tier (estimated daily)

Tier Avg Requests Avg Cost/Request Daily Cost
FREE 10 $0.00 $0.00
BASIC 30 $0.003 $0.09
PREMIUM 60 $0.01 $0.60
ELITE 100 $0.02 $2.00

Testing

Unit Tests

# test_usage_tracking_service.py
def test_log_usage():
    tracker = UsageTrackingService()
    log = tracker.log_usage(
        user_id="test_user",
        model="meta/meta-llama-3-8b-instruct",
        tokens_input=100,
        tokens_output=200,
        task_type=TaskType.STORY_PROGRESSION
    )
    assert log.tokens_total == 300
    assert log.estimated_cost > 0

# test_rate_limiter_service.py
def test_rate_limit_exceeded():
    limiter = RateLimiterService()

    # Exceed free tier limit (50 from config)
    for _ in range(50):
        limiter.increment_usage("test_user")

    with pytest.raises(RateLimitExceeded):
        limiter.check_rate_limit("test_user", UserTier.FREE)

Integration Testing

# Check Redis connection
redis-cli ping

# Check Appwrite connection
python -c "from app.services.usage_tracking_service import UsageTrackingService; UsageTrackingService()"

Future Enhancements (Deferred)

  • Task 7.15: Cost monitoring and alerts (daily job, email alerts)
  • Billing integration
  • Usage quotas per session
  • Real-time cost dashboard
  • Cost projections