Files

Phillip Tarrant 19808dd44c docs: update rate limit values to match config-based system

- Update USAGE_TRACKING.md with new tier limits (50, 200, 1000, unlimited)
- Update AI_INTEGRATION.md with new tier limits
- Add note that limits are loaded from config (ai_calls_per_day)
- Document GET /api/v1/usage endpoint
- Update examples to show is_unlimited field
- Fix test examples with correct limit values

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-26 10:02:30 -06:00

16 KiB

Raw Permalink Blame History

Usage Tracking & Cost Controls

Overview

Code of Conquest implements comprehensive usage tracking and cost controls for AI operations. This ensures sustainable costs, fair usage across tiers, and visibility into system usage patterns.

Key Components:

UsageTrackingService - Logs all AI usage and calculates costs
RateLimiterService - Enforces tier-based daily limits
AIUsageLog - Data model for usage events

Architecture

┌─────────────────────┐
│   AI Task Jobs      │
├─────────────────────┤
│ UsageTrackingService│  ← Logs usage, calculates costs
├─────────────────────┤
│  RateLimiterService │  ← Enforces limits before processing
├─────────────────────┤
│   Redis + Appwrite  │  ← Storage layer
└─────────────────────┘

Usage Tracking Service

File: app/services/usage_tracking_service.py

Initialization

from app.services.usage_tracking_service import UsageTrackingService

tracker = UsageTrackingService()

Required Environment Variables:

APPWRITE_ENDPOINT=https://cloud.appwrite.io/v1
APPWRITE_PROJECT_ID=your-project-id
APPWRITE_API_KEY=your-api-key
APPWRITE_DATABASE_ID=main

Logging Usage

from app.models.ai_usage import TaskType

# Log a usage event
usage_log = tracker.log_usage(
    user_id="user_123",
    model="anthropic/claude-3.5-sonnet",
    tokens_input=150,
    tokens_output=450,
    task_type=TaskType.STORY_PROGRESSION,
    session_id="sess_789",
    character_id="char_456",
    request_duration_ms=2500,
    success=True
)

print(f"Log ID: {usage_log.log_id}")
print(f"Cost: ${usage_log.estimated_cost:.6f}")

Querying Usage

Daily Usage:

from datetime import date

# Get today's usage
usage = tracker.get_daily_usage("user_123", date.today())

print(f"Requests: {usage.total_requests}")
print(f"Tokens: {usage.total_tokens}")
print(f"Input tokens: {usage.total_input_tokens}")
print(f"Output tokens: {usage.total_output_tokens}")
print(f"Cost: ${usage.estimated_cost:.4f}")
print(f"By task: {usage.requests_by_task}")
# {"story_progression": 10, "combat_narration": 3, ...}

Monthly Cost:

# Get November 2025 cost
monthly = tracker.get_monthly_cost("user_123", 2025, 11)

print(f"Monthly requests: {monthly.total_requests}")
print(f"Monthly tokens: {monthly.total_tokens}")
print(f"Monthly cost: ${monthly.estimated_cost:.2f}")

Admin Monitoring:

# Get total platform cost for a day
total_cost = tracker.get_total_daily_cost(date.today())
print(f"Platform daily cost: ${total_cost:.2f}")

# Get user request count for rate limiting
count = tracker.get_user_request_count_today("user_123")

Cost Estimation

Static Methods (no instance needed):

from app.services.usage_tracking_service import UsageTrackingService

# Estimate cost for specific request
cost = UsageTrackingService.estimate_cost_for_model(
    model="anthropic/claude-3.5-sonnet",
    tokens_input=100,
    tokens_output=400
)
print(f"Estimated: ${cost:.6f}")

# Get model pricing
info = UsageTrackingService.get_model_cost_info("anthropic/claude-3.5-sonnet")
print(f"Input: ${info['input']}/1K tokens")
print(f"Output: ${info['output']}/1K tokens")

Model Pricing

Costs per 1,000 tokens (USD):

Model	Input	Output	Tier
`meta/meta-llama-3-8b-instruct`	$0.0001	$0.0001	Free
`meta/meta-llama-3-70b-instruct`	$0.0006	$0.0006	-
`anthropic/claude-3.5-haiku`	$0.001	$0.005	Basic
`anthropic/claude-3.5-sonnet`	$0.003	$0.015	Premium
`anthropic/claude-4.5-sonnet`	$0.003	$0.015	Elite
`anthropic/claude-3-opus`	$0.015	$0.075	-

Default cost for unknown models: $0.001 input, $0.005 output per 1K tokens

Token Estimation

Since the Replicate API doesn't return exact token counts, tokens are estimated based on text length.

Estimation Formula

# Approximate 4 characters per token
tokens = len(text) // 4

How Tokens Are Calculated

Input Tokens:

Calculated from the full prompt sent to the AI
Includes: user prompt + system prompt
Estimated at: len(prompt + system_prompt) // 4

Output Tokens:

Calculated from the AI's response text
Estimated at: len(response_text) // 4

ReplicateResponse Structure

The Replicate client returns both input and output token estimates:

@dataclass
class ReplicateResponse:
    text: str
    tokens_used: int      # Total (input + output)
    tokens_input: int     # Estimated input tokens
    tokens_output: int    # Estimated output tokens
    model: str
    generation_time: float

Example Token Counts

Content	Characters	Estimated Tokens
Short prompt	400 chars	~100 tokens
Full DM prompt	4,000 chars	~1,000 tokens
Short response	200 chars	~50 tokens
Full narrative	800 chars	~200 tokens

Accuracy Notes

Estimation is approximate (~75-80% accurate)
Real tokenization varies by model
Better to over-estimate for cost budgeting
Logs use estimates; billing reconciliation may differ

Data Models

File: app/models/ai_usage.py

AIUsageLog

@dataclass
class AIUsageLog:
    log_id: str                    # Unique identifier
    user_id: str                   # User who made request
    timestamp: datetime            # When request was made
    model: str                     # Model identifier
    tokens_input: int              # Input/prompt tokens
    tokens_output: int             # Output/response tokens
    tokens_total: int              # Total tokens
    estimated_cost: float          # Cost in USD
    task_type: TaskType            # Type of task
    session_id: Optional[str]      # Game session
    character_id: Optional[str]    # Character
    request_duration_ms: int       # Duration
    success: bool                  # Success status
    error_message: Optional[str]   # Error if failed

TaskType Enum

class TaskType(str, Enum):
    STORY_PROGRESSION = "story_progression"
    COMBAT_NARRATION = "combat_narration"
    QUEST_SELECTION = "quest_selection"
    NPC_DIALOGUE = "npc_dialogue"
    GENERAL = "general"

Summary Objects

@dataclass
class DailyUsageSummary:
    date: date
    user_id: str
    total_requests: int
    total_tokens: int
    total_input_tokens: int
    total_output_tokens: int
    estimated_cost: float
    requests_by_task: Dict[str, int]

@dataclass
class MonthlyUsageSummary:
    year: int
    month: int
    user_id: str
    total_requests: int
    total_tokens: int
    estimated_cost: float
    daily_breakdown: list

Rate Limiter Service

File: app/services/rate_limiter_service.py

Daily Turn Limits

Limits are loaded from config (rate_limiting.tiers.{tier}.ai_calls_per_day):

Tier	Limit	Cost Level
FREE	50 turns/day	Zero
BASIC	200 turns/day	Low
PREMIUM	1000 turns/day	Medium
ELITE	Unlimited	High

Counters reset at midnight UTC. A value of -1 in config means unlimited.

Usage API Endpoint

Get current usage info via GET /api/v1/usage:

{
    "user_id": "user_123",
    "user_tier": "free",
    "current_usage": 15,
    "daily_limit": 50,
    "remaining": 35,
    "reset_time": "2025-11-27T00:00:00+00:00",
    "is_limited": false,
    "is_unlimited": false
}

Custom Action Limits

Free-text actions (beyond preset buttons) have additional limits per tier:

Tier	Custom Actions/Day	Character Limit
FREE	10	150 chars
BASIC	50	300 chars
PREMIUM	Unlimited	500 chars
ELITE	Unlimited	500 chars

Configuration: These values are defined in config/*.yaml under rate_limiting.tiers:

tiers:
  free:
    custom_actions_per_day: 10
    custom_action_char_limit: 150

Access in code:

from app.config import get_config

config = get_config()
tier_config = config.rate_limiting.tiers['free']
print(tier_config.custom_actions_per_day)      # 10
print(tier_config.custom_action_char_limit)    # 150

Basic Usage

from app.services.rate_limiter_service import RateLimiterService, RateLimitExceeded
from app.ai.model_selector import UserTier

limiter = RateLimiterService()

# Check and increment (typical flow)
try:
    limiter.check_rate_limit("user_123", UserTier.PREMIUM)
    # Process AI request...
    limiter.increment_usage("user_123")
except RateLimitExceeded as e:
    print(f"Limit reached: {e.current_usage}/{e.limit}")
    print(f"Resets at: {e.reset_time}")

Query Methods

# Get current usage
current = limiter.get_current_usage("user_123")

# Get remaining turns
remaining = limiter.get_remaining_turns("user_123", UserTier.PREMIUM)
print(f"Remaining: {remaining} turns")

# Get comprehensive info
info = limiter.get_usage_info("user_123", UserTier.PREMIUM)
# {
#     "user_id": "user_123",
#     "user_tier": "premium",
#     "current_usage": 45,
#     "daily_limit": 1000,
#     "remaining": 955,
#     "reset_time": "2025-11-22T00:00:00+00:00",
#     "is_limited": False,
#     "is_unlimited": False
# }

# Get limit for tier (-1 means unlimited)
limit = limiter.get_limit_for_tier(UserTier.ELITE)  # -1 (unlimited)

Admin Functions

# Reset user's daily counter (testing/admin)
limiter.reset_usage("user_123")

RateLimitExceeded Exception

class RateLimitExceeded(Exception):
    user_id: str
    user_tier: UserTier
    limit: int
    current_usage: int
    reset_time: datetime

Provides all information needed for user-friendly error messages.

Integration Pattern

In AI Task Jobs

from app.services.rate_limiter_service import RateLimiterService, RateLimitExceeded
from app.services.usage_tracking_service import UsageTrackingService
from app.ai.narrative_generator import NarrativeGenerator
from app.models.ai_usage import TaskType

def process_ai_request(user_id: str, user_tier: UserTier, action: str, ...):
    limiter = RateLimiterService()
    tracker = UsageTrackingService()
    generator = NarrativeGenerator()

    # 1. Check rate limit BEFORE processing
    try:
        limiter.check_rate_limit(user_id, user_tier)
    except RateLimitExceeded as e:
        return {
            "error": "rate_limit_exceeded",
            "message": f"Daily limit reached ({e.limit} turns). Resets at {e.reset_time}",
            "remaining": 0,
            "reset_time": e.reset_time.isoformat()
        }

    # 2. Generate AI response
    start_time = time.time()
    response = generator.generate_story_response(...)
    duration_ms = int((time.time() - start_time) * 1000)

    # 3. Log usage (tokens are estimated in ReplicateClient)
    tracker.log_usage(
        user_id=user_id,
        model=response.model,
        tokens_input=response.tokens_input,   # From prompt length
        tokens_output=response.tokens_output, # From response length
        task_type=TaskType.STORY_PROGRESSION,
        session_id=session_id,
        request_duration_ms=duration_ms,
        success=True
    )

    # 4. Increment rate limit counter
    limiter.increment_usage(user_id)

    return {"narrative": response.narrative, ...}

API Endpoint Pattern

@bp.route('/sessions/<session_id>/action', methods=['POST'])
@require_auth
def take_action(session_id):
    user = get_current_user()
    limiter = RateLimiterService()

    # Check limit and return remaining info
    try:
        limiter.check_rate_limit(user.id, user.tier)
    except RateLimitExceeded as e:
        return api_response(
            status=429,
            error={
                "code": "RATE_LIMIT_EXCEEDED",
                "message": "Daily turn limit reached",
                "details": {
                    "limit": e.limit,
                    "current": e.current_usage,
                    "reset_time": e.reset_time.isoformat()
                }
            }
        )

    # Queue AI job...
    remaining = limiter.get_remaining_turns(user.id, user.tier)

    return api_response(
        status=202,
        result={
            "job_id": job.id,
            "remaining_turns": remaining
        }
    )

Appwrite Collection Schema

Collection: ai_usage_logs

Field	Type	Description
`log_id`	string	Primary key
`user_id`	string	User identifier
`timestamp`	datetime	Request time (UTC)
`model`	string	Model identifier
`tokens_input`	integer	Input tokens
`tokens_output`	integer	Output tokens
`tokens_total`	integer	Total tokens
`estimated_cost`	double	Cost in USD
`task_type`	string	Task type enum
`session_id`	string	Optional session
`character_id`	string	Optional character
`request_duration_ms`	integer	Duration
`success`	boolean	Success status
`error_message`	string	Error if failed

Indexes:

user_id + timestamp (for daily queries)
timestamp (for admin monitoring)

Cost Management Best Practices

1. Pre-request Validation

Always check rate limits before processing:

limiter.check_rate_limit(user_id, user_tier)

2. Log All Requests

Log both successful and failed requests:

tracker.log_usage(
    ...,
    success=False,
    error_message="Model timeout"
)

3. Monitor Platform Costs

# Daily monitoring
daily_cost = tracker.get_total_daily_cost(date.today())

if daily_cost > 50:
    send_alert("WARNING: Daily AI cost exceeded $50")
if daily_cost > 100:
    send_alert("CRITICAL: Daily AI cost exceeded $100")

4. Cost Estimation for UI

Show users estimated costs before actions:

cost_info = UsageTrackingService.get_model_cost_info(model)
estimated = (base_tokens * 1.5 / 1000) * (cost_info['input'] + cost_info['output'])

5. Tier Upgrade Prompts

When rate limited, prompt upgrades:

if e.user_tier == UserTier.FREE:
    message = "Upgrade to Basic for 200 turns/day!"
elif e.user_tier == UserTier.BASIC:
    message = "Upgrade to Premium for 1000 turns/day!"
elif e.user_tier == UserTier.PREMIUM:
    message = "Upgrade to Elite for unlimited turns!"

Target Cost Goals

Development: < $50/day
Production target: < $500/month total
Cost per user: ~$0.10/day (premium tier average)

Cost Breakdown by Tier (estimated daily)

Tier	Avg Requests	Avg Cost/Request	Daily Cost
FREE	10	$0.00	$0.00
BASIC	30	$0.003	$0.09
PREMIUM	60	$0.01	$0.60
ELITE	100	$0.02	$2.00

Testing

Unit Tests

# test_usage_tracking_service.py
def test_log_usage():
    tracker = UsageTrackingService()
    log = tracker.log_usage(
        user_id="test_user",
        model="meta/meta-llama-3-8b-instruct",
        tokens_input=100,
        tokens_output=200,
        task_type=TaskType.STORY_PROGRESSION
    )
    assert log.tokens_total == 300
    assert log.estimated_cost > 0

# test_rate_limiter_service.py
def test_rate_limit_exceeded():
    limiter = RateLimiterService()

    # Exceed free tier limit (50 from config)
    for _ in range(50):
        limiter.increment_usage("test_user")

    with pytest.raises(RateLimitExceeded):
        limiter.check_rate_limit("test_user", UserTier.FREE)

Integration Testing

# Check Redis connection
redis-cli ping

# Check Appwrite connection
python -c "from app.services.usage_tracking_service import UsageTrackingService; UsageTrackingService()"

Future Enhancements (Deferred)

Task 7.15: Cost monitoring and alerts (daily job, email alerts)
Billing integration
Usage quotas per session
Real-time cost dashboard
Cost projections

16 KiB Raw Permalink Blame History