Files
Code_of_Conquest/docs/DEPLOYMENT.md
2025-11-24 23:10:55 -06:00

15 KiB
Raw Blame History

Deployment & Operations

Local Development Setup

Prerequisites

Tool Version Purpose
Python 3.11+ Backend runtime
Docker Latest Local services
Redis 7.0+ Job queue & caching
Git Latest Version control

Setup Steps

# 1. Clone repository
git clone <repo-url>
cd code_of_conquest

# 2. Create virtual environment
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Configure environment
cp .env.example .env
# Edit .env with your API keys and settings

# 5. Start local services
docker-compose up -d

# 6. Start RQ workers
rq worker ai_tasks combat_tasks marketplace_tasks &

# 7. Run Flask development server
flask run --debug

Environment Variables

Variable Description Required
FLASK_ENV development/production Yes
SECRET_KEY Flask secret key Yes
REPLICATE_API_KEY Replicate API key Yes
ANTHROPIC_API_KEY Anthropic API key Yes
APPWRITE_ENDPOINT Appwrite server URL Yes
APPWRITE_PROJECT_ID Appwrite project ID Yes
APPWRITE_API_KEY Appwrite API key Yes
REDIS_URL Redis connection URL Yes
LOG_LEVEL Logging level (DEBUG/INFO/WARNING/ERROR) No

Docker Compose (Local Development)

docker-compose.yml:

version: '3.8'
services:
  redis:
    image: redis:alpine
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data

  rq-worker:
    build: .
    command: rq worker ai_tasks combat_tasks marketplace_tasks --url redis://redis:6379
    depends_on:
      - redis
    env_file:
      - .env
    environment:
      - REDIS_URL=redis://redis:6379

volumes:
  redis_data:

Testing Strategy

Manual Testing (Preferred)

API Testing Document: docs/API_TESTING.md

Contains:

  • Endpoint examples
  • Sample curl/httpie commands
  • Expected responses
  • Authentication setup

Example API Test:

# Login
curl -X POST http://localhost:5000/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email": "test@example.com", "password": "password123"}'

# Create character (with auth token)
curl -X POST http://localhost:5000/api/v1/characters \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{"name": "Aragorn", "class_id": "vanguard"}'

Unit Tests (Optional)

Framework: pytest

Test Categories:

Category Location Focus
Combat tests/test_combat.py Damage calculations, effect processing
Skills tests/test_skills.py Skill unlock logic, prerequisites
Marketplace tests/test_marketplace.py Bidding logic, auction processing
Character tests/test_character.py Character creation, stats

Run Tests:

# All tests
pytest

# Specific test file
pytest tests/test_combat.py

# With coverage
pytest --cov=app tests/

Load Testing

Tool: Locust or Apache Bench

Test Scenarios:

Scenario Target Success Criteria
Concurrent AI requests 50 concurrent users < 5s response time
Marketplace browsing 100 concurrent users < 1s response time
Session realtime updates 10 players per session < 100ms update latency

Production Deployment

Deployment Checklist

Pre-Deployment:

  • All environment variables configured
  • Appwrite collections created with proper permissions
  • Redis configured and accessible
  • RQ workers running
  • SSL certificates installed
  • Rate limiting configured
  • Error logging/monitoring set up (Sentry recommended)
  • Backup strategy for Appwrite data

Production Configuration:

  • DEBUG = False in Flask
  • Secure session keys (random, long)
  • CORS restricted to production domain
  • Rate limits appropriate for production
  • AI cost alerts configured
  • CDN for static assets (optional)

Dockerfile

FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application
COPY . .

# Create non-root user
RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser

# Expose port
EXPOSE 5000

# Run application
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--workers", "4", "wsgi:app"]

Build & Push Script

scripts/build_and_push.sh:

#!/bin/bash

# Get current git branch
BRANCH=$(git rev-parse --abbrev-ref HEAD)

# Ask for tag options
read -p "Tag as :latest? (y/n) " TAG_LATEST
read -p "Push to registry? (y/n) " PUSH_IMAGE

# Build image
docker build -t ai-dungeon-master:$BRANCH .

if [ "$TAG_LATEST" = "y" ]; then
    docker tag ai-dungeon-master:$BRANCH ai-dungeon-master:latest
fi

if [ "$PUSH_IMAGE" = "y" ]; then
    docker push ai-dungeon-master:$BRANCH
    if [ "$TAG_LATEST" = "y" ]; then
        docker push ai-dungeon-master:latest
    fi
fi

Production Environment

Recommended Stack:

  • Web Server: Nginx (reverse proxy)
  • WSGI Server: Gunicorn (4+ workers)
  • Process Manager: Supervisor or systemd
  • Redis: Standalone or Redis Cluster
  • RQ Workers: Separate instances for each queue

Scaling Strategy:

Component Scaling Method Trigger
Flask API Horizontal (add workers) CPU > 70%
RQ Workers Horizontal (add workers) Queue length > 100
Redis Vertical (upgrade instance) Memory > 80%
Appwrite Managed by Appwrite N/A

Monitoring & Logging

Application Logging

Logging Configuration:

Level Use Case Examples
DEBUG Development only Variable values, function calls
INFO Normal operations User actions, API calls
WARNING Potential issues Rate limit approaching, slow queries
ERROR Errors (recoverable) Failed AI calls, validation errors
CRITICAL Critical failures Database connection lost, service down

Structured Logging with Structlog:

import structlog

logger = structlog.get_logger(__name__)

logger.info("Combat action executed",
    session_id=session_id,
    character_id=character_id,
    action_type="attack",
    damage=15
)

Monitoring Tools

Recommended Tools:

Tool Purpose Priority
Sentry Error tracking and alerting High
Prometheus Metrics collection Medium
Grafana Metrics visualization Medium
Uptime Robot Uptime monitoring High
CloudWatch AWS logs/metrics (if using AWS) Medium

Key Metrics to Monitor

Metric Alert Threshold Action
API response time > 3s average Scale workers
Error rate > 5% Investigate logs
AI API errors > 10% Check API status
Queue length > 500 Add workers
Redis memory > 80% Upgrade instance
CPU usage > 80% Scale horizontally
AI cost per day > budget × 1.2 Investigate usage

AI Cost Tracking

Log Structure:

Field Type Purpose
user_id str Track per-user usage
model str Which model used
tier str FREE/STANDARD/PREMIUM
tokens_used int Token count
cost_estimate float Estimated cost
timestamp datetime When called
context_type str What prompted the call

Daily Report:

  • Total AI calls per tier
  • Total tokens used
  • Estimated cost
  • Top users by usage
  • Anomaly detection (unusual spikes)

Security

Authentication & Authorization

Implementation:

Layer Method Details
User Auth Appwrite Auth Email/password, OAuth providers
API Auth JWT tokens Bearer token in Authorization header
Session Validation Every API call Verify token, check expiry
Resource Access User ID check Users can only access their own data

Input Validation

Validation Strategy:

Input Type Validation Tools
JSON payloads Schema validation Marshmallow or Pydantic
Character names Sanitize, length limits Bleach library
Chat messages Sanitize, profanity filter Custom validators
AI prompts Template-based only Jinja2 (no direct user input)

Example Validation:

Field Rules
Character name 3-20 chars, alphanumeric + spaces only
Gold amount Positive integer, max 999,999,999
Action text Max 500 chars, sanitized HTML

Rate Limiting

Implementation: Flask-Limiter with Redis backend

Limits by Tier:

Tier API Calls/Min AI Calls/Day Marketplace Actions/Day
FREE 30 50 N/A
BASIC 60 200 N/A
PREMIUM 120 1000 50
ELITE 300 Unlimited 100

Rate Limit Bypass:

  • Admin accounts
  • Health check endpoints
  • Static assets

API Security

Configuration:

Setting Value Reason
CORS Production domain only Prevent unauthorized access
HTTPS Required Encrypt data in transit
API Keys Environment variables Never in code
Appwrite Permissions Least privilege Collection-level security
SQL Injection N/A Using Appwrite (NoSQL)
XSS Sanitize all inputs Prevent script injection
CSRF CSRF tokens For form submissions

Data Protection

Access Control Matrix:

Resource Owner Party Member Public System
Characters RW R - RW
Sessions R RW (turn) - RW
Marketplace Listings RW (own) - R RW
Transactions R (own) - - RW

RW = Read/Write, R = Read only, - = No access

Secrets Management

Never Commit:

  • API keys
  • Database credentials
  • Secret keys
  • Tokens

Best Practices:

  • Use .env for local development
  • Use environment variables in production
  • Use secrets manager (AWS Secrets Manager, HashiCorp Vault) in production
  • Rotate keys regularly
  • Different keys for dev/staging/prod

Backup & Recovery

Appwrite Data Backup

Strategy:

Data Type Backup Frequency Retention Method
Characters Daily 30 days Appwrite export
Sessions (active) Hourly 7 days Appwrite export
Marketplace Daily 30 days Appwrite export
Transactions Daily 90 days Appwrite export

Backup Script:

  • Export collections to JSON
  • Compress and encrypt
  • Upload to S3 or object storage
  • Verify backup integrity

Disaster Recovery Plan

Scenario RTO RPO Steps
Database corruption 4 hours 24 hours Restore from latest backup
API server down 15 minutes 0 Restart/failover to standby
Redis failure 5 minutes Session data loss Restart, users re-login
Complete infrastructure loss 24 hours 24 hours Restore from backups to new infrastructure

RTO = Recovery Time Objective, RPO = Recovery Point Objective


CI/CD Pipeline

Stage Actions Tools
1. Commit Developer pushes to dev branch Git
2. Build Run tests, lint code GitHub Actions, pytest, flake8
3. Test Unit tests, integration tests pytest
4. Build Image Create Docker image Docker
5. Deploy to Staging Deploy to staging environment Docker, SSH
6. Manual Test QA testing on staging Manual
7. Merge to Beta Promote to beta branch Git
8. Deploy to Beta Deploy to beta environment Docker, SSH
9. Merge to Master Production promotion Git
10. Deploy to Prod Deploy to production Docker, SSH
11. Tag Release Create version tag Git

GitHub Actions Example

name: CI/CD

on:
  push:
    branches: [ dev, beta, master ]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: 3.11
      - name: Install dependencies
        run: pip install -r requirements.txt
      - name: Run tests
        run: pytest
      - name: Lint
        run: flake8 app/

  build:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Build Docker image
        run: docker build -t ai-dungeon-master:${{ github.ref_name }} .
      - name: Push to registry
        run: docker push ai-dungeon-master:${{ github.ref_name }}

Performance Optimization

Caching Strategy

Cache Type What to Cache TTL
Redis Cache Session data 30 minutes
Character data (read-heavy) 5 minutes
Marketplace listings 1 minute
NPC shop items 1 hour
Browser Cache Static assets 1 year
API responses (GET) 30 seconds

Database Optimization

Appwrite Indexing:

  • Index userId on characters collection
  • Index status on game_sessions collection
  • Index listing_type + status on marketplace_listings
  • Index created_at for time-based queries

AI Call Optimization

Strategies:

Strategy Impact Implementation
Batch requests Reduce API calls Combine multiple actions
Cache common responses Reduce cost Cache item descriptions
Prompt optimization Reduce tokens Shorter, more efficient prompts
Model selection Reduce cost Use cheaper models when appropriate

Troubleshooting

Common Issues

Issue Symptoms Solution
RQ workers not processing Jobs stuck in queue Check Redis connection, restart workers
AI calls failing 401/403 errors Verify API keys, check rate limits
Appwrite connection errors Database errors Check Appwrite status, verify credentials
Session not updating Stale data in UI Check Appwrite Realtime connection
High latency Slow API responses Check RQ queue length, scale workers

Debug Mode

Enable Debug Logging:

export LOG_LEVEL=DEBUG
flask run --debug

Debug Endpoints (development only):

  • GET /debug/health - Health check
  • GET /debug/redis - Redis connection status
  • GET /debug/queues - RQ queue status

Resources

Resource URL
Appwrite Docs https://appwrite.io/docs
RQ Docs https://python-rq.org/
Flask Docs https://flask.palletsprojects.com/
Structlog Docs https://www.structlog.org/
HTMX Docs https://htmx.org/docs/
Anthropic API https://docs.anthropic.com/
Replicate API https://replicate.com/docs