# Deployment & Operations ## Local Development Setup ### Prerequisites | Tool | Version | Purpose | |------|---------|---------| | Python | 3.11+ | Backend runtime | | Docker | Latest | Local services | | Redis | 7.0+ | Job queue & caching | | Git | Latest | Version control | ### Setup Steps ```bash # 1. Clone repository git clone cd code_of_conquest # 2. Create virtual environment python3 -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate # 3. Install dependencies pip install -r requirements.txt # 4. Configure environment cp .env.example .env # Edit .env with your API keys and settings # 5. Start local services docker-compose up -d # 6. Start RQ workers rq worker ai_tasks combat_tasks marketplace_tasks & # 7. Run Flask development server flask run --debug ``` ### Environment Variables | Variable | Description | Required | |----------|-------------|----------| | `FLASK_ENV` | development/production | Yes | | `SECRET_KEY` | Flask secret key | Yes | | `REPLICATE_API_KEY` | Replicate API key | Yes | | `ANTHROPIC_API_KEY` | Anthropic API key | Yes | | `APPWRITE_ENDPOINT` | Appwrite server URL | Yes | | `APPWRITE_PROJECT_ID` | Appwrite project ID | Yes | | `APPWRITE_API_KEY` | Appwrite API key | Yes | | `REDIS_URL` | Redis connection URL | Yes | | `LOG_LEVEL` | Logging level (DEBUG/INFO/WARNING/ERROR) | No | --- ## Docker Compose (Local Development) **docker-compose.yml:** ```yaml version: '3.8' services: redis: image: redis:alpine ports: - "6379:6379" volumes: - redis_data:/data rq-worker: build: . command: rq worker ai_tasks combat_tasks marketplace_tasks --url redis://redis:6379 depends_on: - redis env_file: - .env environment: - REDIS_URL=redis://redis:6379 volumes: redis_data: ``` --- ## Testing Strategy ### Manual Testing (Preferred) **API Testing Document:** `docs/API_TESTING.md` Contains: - Endpoint examples - Sample curl/httpie commands - Expected responses - Authentication setup **Example API Test:** ```bash # Login curl -X POST http://localhost:5000/api/v1/auth/login \ -H "Content-Type: application/json" \ -d '{"email": "test@example.com", "password": "password123"}' # Create character (with auth token) curl -X POST http://localhost:5000/api/v1/characters \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d '{"name": "Aragorn", "class_id": "vanguard"}' ``` ### Unit Tests (Optional) **Framework:** pytest **Test Categories:** | Category | Location | Focus | |----------|----------|-------| | Combat | `tests/test_combat.py` | Damage calculations, effect processing | | Skills | `tests/test_skills.py` | Skill unlock logic, prerequisites | | Marketplace | `tests/test_marketplace.py` | Bidding logic, auction processing | | Character | `tests/test_character.py` | Character creation, stats | **Run Tests:** ```bash # All tests pytest # Specific test file pytest tests/test_combat.py # With coverage pytest --cov=app tests/ ``` ### Load Testing **Tool:** Locust or Apache Bench **Test Scenarios:** | Scenario | Target | Success Criteria | |----------|--------|------------------| | Concurrent AI requests | 50 concurrent users | < 5s response time | | Marketplace browsing | 100 concurrent users | < 1s response time | | Session realtime updates | 10 players per session | < 100ms update latency | --- ## Production Deployment ### Deployment Checklist **Pre-Deployment:** - [ ] All environment variables configured - [ ] Appwrite collections created with proper permissions - [ ] Redis configured and accessible - [ ] RQ workers running - [ ] SSL certificates installed - [ ] Rate limiting configured - [ ] Error logging/monitoring set up (Sentry recommended) - [ ] Backup strategy for Appwrite data **Production Configuration:** - [ ] `DEBUG = False` in Flask - [ ] Secure session keys (random, long) - [ ] CORS restricted to production domain - [ ] Rate limits appropriate for production - [ ] AI cost alerts configured - [ ] CDN for static assets (optional) ### Dockerfile ```dockerfile FROM python:3.11-slim WORKDIR /app # Install dependencies COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy application COPY . . # Create non-root user RUN useradd -m appuser && chown -R appuser:appuser /app USER appuser # Expose port EXPOSE 5000 # Run application CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--workers", "4", "wsgi:app"] ``` ### Build & Push Script **scripts/build_and_push.sh:** ```bash #!/bin/bash # Get current git branch BRANCH=$(git rev-parse --abbrev-ref HEAD) # Ask for tag options read -p "Tag as :latest? (y/n) " TAG_LATEST read -p "Push to registry? (y/n) " PUSH_IMAGE # Build image docker build -t ai-dungeon-master:$BRANCH . if [ "$TAG_LATEST" = "y" ]; then docker tag ai-dungeon-master:$BRANCH ai-dungeon-master:latest fi if [ "$PUSH_IMAGE" = "y" ]; then docker push ai-dungeon-master:$BRANCH if [ "$TAG_LATEST" = "y" ]; then docker push ai-dungeon-master:latest fi fi ``` ### Production Environment **Recommended Stack:** - **Web Server:** Nginx (reverse proxy) - **WSGI Server:** Gunicorn (4+ workers) - **Process Manager:** Supervisor or systemd - **Redis:** Standalone or Redis Cluster - **RQ Workers:** Separate instances for each queue **Scaling Strategy:** | Component | Scaling Method | Trigger | |-----------|----------------|---------| | Flask API | Horizontal (add workers) | CPU > 70% | | RQ Workers | Horizontal (add workers) | Queue length > 100 | | Redis | Vertical (upgrade instance) | Memory > 80% | | Appwrite | Managed by Appwrite | N/A | --- ## Monitoring & Logging ### Application Logging **Logging Configuration:** | Level | Use Case | Examples | |-------|----------|----------| | DEBUG | Development only | Variable values, function calls | | INFO | Normal operations | User actions, API calls | | WARNING | Potential issues | Rate limit approaching, slow queries | | ERROR | Errors (recoverable) | Failed AI calls, validation errors | | CRITICAL | Critical failures | Database connection lost, service down | **Structured Logging with Structlog:** ```python import structlog logger = structlog.get_logger(__name__) logger.info("Combat action executed", session_id=session_id, character_id=character_id, action_type="attack", damage=15 ) ``` ### Monitoring Tools **Recommended Tools:** | Tool | Purpose | Priority | |------|---------|----------| | **Sentry** | Error tracking and alerting | High | | **Prometheus** | Metrics collection | Medium | | **Grafana** | Metrics visualization | Medium | | **Uptime Robot** | Uptime monitoring | High | | **CloudWatch** | AWS logs/metrics (if using AWS) | Medium | ### Key Metrics to Monitor | Metric | Alert Threshold | Action | |--------|----------------|--------| | API response time | > 3s average | Scale workers | | Error rate | > 5% | Investigate logs | | AI API errors | > 10% | Check API status | | Queue length | > 500 | Add workers | | Redis memory | > 80% | Upgrade instance | | CPU usage | > 80% | Scale horizontally | | AI cost per day | > budget × 1.2 | Investigate usage | ### AI Cost Tracking **Log Structure:** | Field | Type | Purpose | |-------|------|---------| | `user_id` | str | Track per-user usage | | `model` | str | Which model used | | `tier` | str | FREE/STANDARD/PREMIUM | | `tokens_used` | int | Token count | | `cost_estimate` | float | Estimated cost | | `timestamp` | datetime | When called | | `context_type` | str | What prompted the call | **Daily Report:** - Total AI calls per tier - Total tokens used - Estimated cost - Top users by usage - Anomaly detection (unusual spikes) --- ## Security ### Authentication & Authorization **Implementation:** | Layer | Method | Details | |-------|--------|---------| | **User Auth** | Appwrite Auth | Email/password, OAuth providers | | **API Auth** | JWT tokens | Bearer token in Authorization header | | **Session Validation** | Every API call | Verify token, check expiry | | **Resource Access** | User ID check | Users can only access their own data | ### Input Validation **Validation Strategy:** | Input Type | Validation | Tools | |------------|------------|-------| | JSON payloads | Schema validation | Marshmallow or Pydantic | | Character names | Sanitize, length limits | Bleach library | | Chat messages | Sanitize, profanity filter | Custom validators | | AI prompts | Template-based only | Jinja2 (no direct user input) | **Example Validation:** | Field | Rules | |-------|-------| | Character name | 3-20 chars, alphanumeric + spaces only | | Gold amount | Positive integer, max 999,999,999 | | Action text | Max 500 chars, sanitized HTML | ### Rate Limiting **Implementation:** Flask-Limiter with Redis backend **Limits by Tier:** | Tier | API Calls/Min | AI Calls/Day | Marketplace Actions/Day | |------|---------------|--------------|------------------------| | FREE | 30 | 50 | N/A | | BASIC | 60 | 200 | N/A | | PREMIUM | 120 | 1000 | 50 | | ELITE | 300 | Unlimited | 100 | **Rate Limit Bypass:** - Admin accounts - Health check endpoints - Static assets ### API Security **Configuration:** | Setting | Value | Reason | |---------|-------|--------| | **CORS** | Production domain only | Prevent unauthorized access | | **HTTPS** | Required | Encrypt data in transit | | **API Keys** | Environment variables | Never in code | | **Appwrite Permissions** | Least privilege | Collection-level security | | **SQL Injection** | N/A | Using Appwrite (NoSQL) | | **XSS** | Sanitize all inputs | Prevent script injection | | **CSRF** | CSRF tokens | For form submissions | ### Data Protection **Access Control Matrix:** | Resource | Owner | Party Member | Public | System | |----------|-------|--------------|--------|--------| | Characters | RW | R | - | RW | | Sessions | R | RW (turn) | - | RW | | Marketplace Listings | RW (own) | - | R | RW | | Transactions | R (own) | - | - | RW | **RW = Read/Write, R = Read only, - = No access** ### Secrets Management **Never Commit:** - API keys - Database credentials - Secret keys - Tokens **Best Practices:** - Use `.env` for local development - Use environment variables in production - Use secrets manager (AWS Secrets Manager, HashiCorp Vault) in production - Rotate keys regularly - Different keys for dev/staging/prod --- ## Backup & Recovery ### Appwrite Data Backup **Strategy:** | Data Type | Backup Frequency | Retention | Method | |-----------|------------------|-----------|--------| | Characters | Daily | 30 days | Appwrite export | | Sessions (active) | Hourly | 7 days | Appwrite export | | Marketplace | Daily | 30 days | Appwrite export | | Transactions | Daily | 90 days | Appwrite export | **Backup Script:** - Export collections to JSON - Compress and encrypt - Upload to S3 or object storage - Verify backup integrity ### Disaster Recovery Plan | Scenario | RTO | RPO | Steps | |----------|-----|-----|-------| | **Database corruption** | 4 hours | 24 hours | Restore from latest backup | | **API server down** | 15 minutes | 0 | Restart/failover to standby | | **Redis failure** | 5 minutes | Session data loss | Restart, users re-login | | **Complete infrastructure loss** | 24 hours | 24 hours | Restore from backups to new infrastructure | **RTO = Recovery Time Objective, RPO = Recovery Point Objective** --- ## CI/CD Pipeline ### Recommended Workflow | Stage | Actions | Tools | |-------|---------|-------| | **1. Commit** | Developer pushes to `dev` branch | Git | | **2. Build** | Run tests, lint code | GitHub Actions, pytest, flake8 | | **3. Test** | Unit tests, integration tests | pytest | | **4. Build Image** | Create Docker image | Docker | | **5. Deploy to Staging** | Deploy to staging environment | Docker, SSH | | **6. Manual Test** | QA testing on staging | Manual | | **7. Merge to Beta** | Promote to beta branch | Git | | **8. Deploy to Beta** | Deploy to beta environment | Docker, SSH | | **9. Merge to Master** | Production promotion | Git | | **10. Deploy to Prod** | Deploy to production | Docker, SSH | | **11. Tag Release** | Create version tag | Git | ### GitHub Actions Example ```yaml name: CI/CD on: push: branches: [ dev, beta, master ] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Set up Python uses: actions/setup-python@v2 with: python-version: 3.11 - name: Install dependencies run: pip install -r requirements.txt - name: Run tests run: pytest - name: Lint run: flake8 app/ build: needs: test runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Build Docker image run: docker build -t ai-dungeon-master:${{ github.ref_name }} . - name: Push to registry run: docker push ai-dungeon-master:${{ github.ref_name }} ``` --- ## Performance Optimization ### Caching Strategy | Cache Type | What to Cache | TTL | |------------|---------------|-----| | **Redis Cache** | Session data | 30 minutes | | | Character data (read-heavy) | 5 minutes | | | Marketplace listings | 1 minute | | | NPC shop items | 1 hour | | **Browser Cache** | Static assets | 1 year | | | API responses (GET) | 30 seconds | ### Database Optimization **Appwrite Indexing:** - Index `userId` on characters collection - Index `status` on game_sessions collection - Index `listing_type` + `status` on marketplace_listings - Index `created_at` for time-based queries ### AI Call Optimization **Strategies:** | Strategy | Impact | Implementation | |----------|--------|----------------| | **Batch requests** | Reduce API calls | Combine multiple actions | | **Cache common responses** | Reduce cost | Cache item descriptions | | **Prompt optimization** | Reduce tokens | Shorter, more efficient prompts | | **Model selection** | Reduce cost | Use cheaper models when appropriate | --- ## Troubleshooting ### Common Issues | Issue | Symptoms | Solution | |-------|----------|----------| | **RQ workers not processing** | Jobs stuck in queue | Check Redis connection, restart workers | | **AI calls failing** | 401/403 errors | Verify API keys, check rate limits | | **Appwrite connection errors** | Database errors | Check Appwrite status, verify credentials | | **Session not updating** | Stale data in UI | Check Appwrite Realtime connection | | **High latency** | Slow API responses | Check RQ queue length, scale workers | ### Debug Mode **Enable Debug Logging:** ```bash export LOG_LEVEL=DEBUG flask run --debug ``` **Debug Endpoints (development only):** - `GET /debug/health` - Health check - `GET /debug/redis` - Redis connection status - `GET /debug/queues` - RQ queue status --- ## Resources | Resource | URL | |----------|-----| | **Appwrite Docs** | https://appwrite.io/docs | | **RQ Docs** | https://python-rq.org/ | | **Flask Docs** | https://flask.palletsprojects.com/ | | **Structlog Docs** | https://www.structlog.org/ | | **HTMX Docs** | https://htmx.org/docs/ | | **Anthropic API** | https://docs.anthropic.com/ | | **Replicate API** | https://replicate.com/docs |