first commit

2025-11-24 23:10:55 -06:00
commit 8315fa51c9
279 changed files with 74600 additions and 0 deletions
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -0,0 +1,580 @@
+# Deployment & Operations
+
+## Local Development Setup
+
+### Prerequisites
+
+| Tool | Version | Purpose |
+|------|---------|---------|
+| Python | 3.11+ | Backend runtime |
+| Docker | Latest | Local services |
+| Redis | 7.0+ | Job queue & caching |
+| Git | Latest | Version control |
+
+### Setup Steps
+
+```bash
+# 1. Clone repository
+git clone <repo-url>
+cd code_of_conquest
+
+# 2. Create virtual environment
+python3 -m venv venv
+source venv/bin/activate  # Windows: venv\Scripts\activate
+
+# 3. Install dependencies
+pip install -r requirements.txt
+
+# 4. Configure environment
+cp .env.example .env
+# Edit .env with your API keys and settings
+
+# 5. Start local services
+docker-compose up -d
+
+# 6. Start RQ workers
+rq worker ai_tasks combat_tasks marketplace_tasks &
+
+# 7. Run Flask development server
+flask run --debug
+```
+
+### Environment Variables
+
+| Variable | Description | Required |
+|----------|-------------|----------|
+| `FLASK_ENV` | development/production | Yes |
+| `SECRET_KEY` | Flask secret key | Yes |
+| `REPLICATE_API_KEY` | Replicate API key | Yes |
+| `ANTHROPIC_API_KEY` | Anthropic API key | Yes |
+| `APPWRITE_ENDPOINT` | Appwrite server URL | Yes |
+| `APPWRITE_PROJECT_ID` | Appwrite project ID | Yes |
+| `APPWRITE_API_KEY` | Appwrite API key | Yes |
+| `REDIS_URL` | Redis connection URL | Yes |
+| `LOG_LEVEL` | Logging level (DEBUG/INFO/WARNING/ERROR) | No |
+
+---
+
+## Docker Compose (Local Development)
+
+**docker-compose.yml:**
+
+```yaml
+version: '3.8'
+services:
+  redis:
+    image: redis:alpine
+    ports:
+      - "6379:6379"
+    volumes:
+      - redis_data:/data
+
+  rq-worker:
+    build: .
+    command: rq worker ai_tasks combat_tasks marketplace_tasks --url redis://redis:6379
+    depends_on:
+      - redis
+    env_file:
+      - .env
+    environment:
+      - REDIS_URL=redis://redis:6379
+
+volumes:
+  redis_data:
+```
+
+---
+
+## Testing Strategy
+
+### Manual Testing (Preferred)
+
+**API Testing Document:** `docs/API_TESTING.md`
+
+Contains:
+- Endpoint examples
+- Sample curl/httpie commands
+- Expected responses
+- Authentication setup
+
+**Example API Test:**
+
+```bash
+# Login
+curl -X POST http://localhost:5000/api/v1/auth/login \
+  -H "Content-Type: application/json" \
+  -d '{"email": "test@example.com", "password": "password123"}'
+
+# Create character (with auth token)
+curl -X POST http://localhost:5000/api/v1/characters \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer <token>" \
+  -d '{"name": "Aragorn", "class_id": "vanguard"}'
+```
+
+### Unit Tests (Optional)
+
+**Framework:** pytest
+
+**Test Categories:**
+
+| Category | Location | Focus |
+|----------|----------|-------|
+| Combat | `tests/test_combat.py` | Damage calculations, effect processing |
+| Skills | `tests/test_skills.py` | Skill unlock logic, prerequisites |
+| Marketplace | `tests/test_marketplace.py` | Bidding logic, auction processing |
+| Character | `tests/test_character.py` | Character creation, stats |
+
+**Run Tests:**
+```bash
+# All tests
+pytest
+
+# Specific test file
+pytest tests/test_combat.py
+
+# With coverage
+pytest --cov=app tests/
+```
+
+### Load Testing
+
+**Tool:** Locust or Apache Bench
+
+**Test Scenarios:**
+
+| Scenario | Target | Success Criteria |
+|----------|--------|------------------|
+| Concurrent AI requests | 50 concurrent users | < 5s response time |
+| Marketplace browsing | 100 concurrent users | < 1s response time |
+| Session realtime updates | 10 players per session | < 100ms update latency |
+
+---
+
+## Production Deployment
+
+### Deployment Checklist
+
+**Pre-Deployment:**
+- [ ] All environment variables configured
+- [ ] Appwrite collections created with proper permissions
+- [ ] Redis configured and accessible
+- [ ] RQ workers running
+- [ ] SSL certificates installed
+- [ ] Rate limiting configured
+- [ ] Error logging/monitoring set up (Sentry recommended)
+- [ ] Backup strategy for Appwrite data
+
+**Production Configuration:**
+- [ ] `DEBUG = False` in Flask
+- [ ] Secure session keys (random, long)
+- [ ] CORS restricted to production domain
+- [ ] Rate limits appropriate for production
+- [ ] AI cost alerts configured
+- [ ] CDN for static assets (optional)
+
+### Dockerfile
+
+```dockerfile
+FROM python:3.11-slim
+
+WORKDIR /app
+
+# Install dependencies
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Copy application
+COPY . .
+
+# Create non-root user
+RUN useradd -m appuser && chown -R appuser:appuser /app
+USER appuser
+
+# Expose port
+EXPOSE 5000
+
+# Run application
+CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--workers", "4", "wsgi:app"]
+```
+
+### Build & Push Script
+
+**scripts/build_and_push.sh:**
+
+```bash
+#!/bin/bash
+
+# Get current git branch
+BRANCH=$(git rev-parse --abbrev-ref HEAD)
+
+# Ask for tag options
+read -p "Tag as :latest? (y/n) " TAG_LATEST
+read -p "Push to registry? (y/n) " PUSH_IMAGE
+
+# Build image
+docker build -t ai-dungeon-master:$BRANCH .
+
+if [ "$TAG_LATEST" = "y" ]; then
+    docker tag ai-dungeon-master:$BRANCH ai-dungeon-master:latest
+fi
+
+if [ "$PUSH_IMAGE" = "y" ]; then
+    docker push ai-dungeon-master:$BRANCH
+    if [ "$TAG_LATEST" = "y" ]; then
+        docker push ai-dungeon-master:latest
+    fi
+fi
+```
+
+### Production Environment
+
+**Recommended Stack:**
+- **Web Server:** Nginx (reverse proxy)
+- **WSGI Server:** Gunicorn (4+ workers)
+- **Process Manager:** Supervisor or systemd
+- **Redis:** Standalone or Redis Cluster
+- **RQ Workers:** Separate instances for each queue
+
+**Scaling Strategy:**
+
+| Component | Scaling Method | Trigger |
+|-----------|----------------|---------|
+| Flask API | Horizontal (add workers) | CPU > 70% |
+| RQ Workers | Horizontal (add workers) | Queue length > 100 |
+| Redis | Vertical (upgrade instance) | Memory > 80% |
+| Appwrite | Managed by Appwrite | N/A |
+
+---
+
+## Monitoring & Logging
+
+### Application Logging
+
+**Logging Configuration:**
+
+| Level | Use Case | Examples |
+|-------|----------|----------|
+| DEBUG | Development only | Variable values, function calls |
+| INFO | Normal operations | User actions, API calls |
+| WARNING | Potential issues | Rate limit approaching, slow queries |
+| ERROR | Errors (recoverable) | Failed AI calls, validation errors |
+| CRITICAL | Critical failures | Database connection lost, service down |
+
+**Structured Logging with Structlog:**
+
+```python
+import structlog
+
+logger = structlog.get_logger(__name__)
+
+logger.info("Combat action executed",
+    session_id=session_id,
+    character_id=character_id,
+    action_type="attack",
+    damage=15
+)
+```
+
+### Monitoring Tools
+
+**Recommended Tools:**
+
+| Tool | Purpose | Priority |
+|------|---------|----------|
+| **Sentry** | Error tracking and alerting | High |
+| **Prometheus** | Metrics collection | Medium |
+| **Grafana** | Metrics visualization | Medium |
+| **Uptime Robot** | Uptime monitoring | High |
+| **CloudWatch** | AWS logs/metrics (if using AWS) | Medium |
+
+### Key Metrics to Monitor
+
+| Metric | Alert Threshold | Action |
+|--------|----------------|--------|
+| API response time | > 3s average | Scale workers |
+| Error rate | > 5% | Investigate logs |
+| AI API errors | > 10% | Check API status |
+| Queue length | > 500 | Add workers |
+| Redis memory | > 80% | Upgrade instance |
+| CPU usage | > 80% | Scale horizontally |
+| AI cost per day | > budget × 1.2 | Investigate usage |
+
+### AI Cost Tracking
+
+**Log Structure:**
+
+| Field | Type | Purpose |
+|-------|------|---------|
+| `user_id` | str | Track per-user usage |
+| `model` | str | Which model used |
+| `tier` | str | FREE/STANDARD/PREMIUM |
+| `tokens_used` | int | Token count |
+| `cost_estimate` | float | Estimated cost |
+| `timestamp` | datetime | When called |
+| `context_type` | str | What prompted the call |
+
+**Daily Report:**
+- Total AI calls per tier
+- Total tokens used
+- Estimated cost
+- Top users by usage
+- Anomaly detection (unusual spikes)
+
+---
+
+## Security
+
+### Authentication & Authorization
+
+**Implementation:**
+
+| Layer | Method | Details |
+|-------|--------|---------|
+| **User Auth** | Appwrite Auth | Email/password, OAuth providers |
+| **API Auth** | JWT tokens | Bearer token in Authorization header |
+| **Session Validation** | Every API call | Verify token, check expiry |
+| **Resource Access** | User ID check | Users can only access their own data |
+
+### Input Validation
+
+**Validation Strategy:**
+
+| Input Type | Validation | Tools |
+|------------|------------|-------|
+| JSON payloads | Schema validation | Marshmallow or Pydantic |
+| Character names | Sanitize, length limits | Bleach library |
+| Chat messages | Sanitize, profanity filter | Custom validators |
+| AI prompts | Template-based only | Jinja2 (no direct user input) |
+
+**Example Validation:**
+
+| Field | Rules |
+|-------|-------|
+| Character name | 3-20 chars, alphanumeric + spaces only |
+| Gold amount | Positive integer, max 999,999,999 |
+| Action text | Max 500 chars, sanitized HTML |
+
+### Rate Limiting
+
+**Implementation:** Flask-Limiter with Redis backend
+
+**Limits by Tier:**
+
+| Tier | API Calls/Min | AI Calls/Day | Marketplace Actions/Day |
+|------|---------------|--------------|------------------------|
+| FREE | 30 | 50 | N/A |
+| BASIC | 60 | 200 | N/A |
+| PREMIUM | 120 | 1000 | 50 |
+| ELITE | 300 | Unlimited | 100 |
+
+**Rate Limit Bypass:**
+- Admin accounts
+- Health check endpoints
+- Static assets
+
+### API Security
+
+**Configuration:**
+
+| Setting | Value | Reason |
+|---------|-------|--------|
+| **CORS** | Production domain only | Prevent unauthorized access |
+| **HTTPS** | Required | Encrypt data in transit |
+| **API Keys** | Environment variables | Never in code |
+| **Appwrite Permissions** | Least privilege | Collection-level security |
+| **SQL Injection** | N/A | Using Appwrite (NoSQL) |
+| **XSS** | Sanitize all inputs | Prevent script injection |
+| **CSRF** | CSRF tokens | For form submissions |
+
+### Data Protection
+
+**Access Control Matrix:**
+
+| Resource | Owner | Party Member | Public | System |
+|----------|-------|--------------|--------|--------|
+| Characters | RW | R | - | RW |
+| Sessions | R | RW (turn) | - | RW |
+| Marketplace Listings | RW (own) | - | R | RW |
+| Transactions | R (own) | - | - | RW |
+
+**RW = Read/Write, R = Read only, - = No access**
+
+### Secrets Management
+
+**Never Commit:**
+- API keys
+- Database credentials
+- Secret keys
+- Tokens
+
+**Best Practices:**
+- Use `.env` for local development
+- Use environment variables in production
+- Use secrets manager (AWS Secrets Manager, HashiCorp Vault) in production
+- Rotate keys regularly
+- Different keys for dev/staging/prod
+
+---
+
+## Backup & Recovery
+
+### Appwrite Data Backup
+
+**Strategy:**
+
+| Data Type | Backup Frequency | Retention | Method |
+|-----------|------------------|-----------|--------|
+| Characters | Daily | 30 days | Appwrite export |
+| Sessions (active) | Hourly | 7 days | Appwrite export |
+| Marketplace | Daily | 30 days | Appwrite export |
+| Transactions | Daily | 90 days | Appwrite export |
+
+**Backup Script:**
+- Export collections to JSON
+- Compress and encrypt
+- Upload to S3 or object storage
+- Verify backup integrity
+
+### Disaster Recovery Plan
+
+| Scenario | RTO | RPO | Steps |
+|----------|-----|-----|-------|
+| **Database corruption** | 4 hours | 24 hours | Restore from latest backup |
+| **API server down** | 15 minutes | 0 | Restart/failover to standby |
+| **Redis failure** | 5 minutes | Session data loss | Restart, users re-login |
+| **Complete infrastructure loss** | 24 hours | 24 hours | Restore from backups to new infrastructure |
+
+**RTO = Recovery Time Objective, RPO = Recovery Point Objective**
+
+---
+
+## CI/CD Pipeline
+
+### Recommended Workflow
+
+| Stage | Actions | Tools |
+|-------|---------|-------|
+| **1. Commit** | Developer pushes to `dev` branch | Git |
+| **2. Build** | Run tests, lint code | GitHub Actions, pytest, flake8 |
+| **3. Test** | Unit tests, integration tests | pytest |
+| **4. Build Image** | Create Docker image | Docker |
+| **5. Deploy to Staging** | Deploy to staging environment | Docker, SSH |
+| **6. Manual Test** | QA testing on staging | Manual |
+| **7. Merge to Beta** | Promote to beta branch | Git |
+| **8. Deploy to Beta** | Deploy to beta environment | Docker, SSH |
+| **9. Merge to Master** | Production promotion | Git |
+| **10. Deploy to Prod** | Deploy to production | Docker, SSH |
+| **11. Tag Release** | Create version tag | Git |
+
+### GitHub Actions Example
+
+```yaml
+name: CI/CD
+
+on:
+  push:
+    branches: [ dev, beta, master ]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - name: Set up Python
+        uses: actions/setup-python@v2
+        with:
+          python-version: 3.11
+      - name: Install dependencies
+        run: pip install -r requirements.txt
+      - name: Run tests
+        run: pytest
+      - name: Lint
+        run: flake8 app/
+
+  build:
+    needs: test
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - name: Build Docker image
+        run: docker build -t ai-dungeon-master:${{ github.ref_name }} .
+      - name: Push to registry
+        run: docker push ai-dungeon-master:${{ github.ref_name }}
+```
+
+---
+
+## Performance Optimization
+
+### Caching Strategy
+
+| Cache Type | What to Cache | TTL |
+|------------|---------------|-----|
+| **Redis Cache** | Session data | 30 minutes |
+| | Character data (read-heavy) | 5 minutes |
+| | Marketplace listings | 1 minute |
+| | NPC shop items | 1 hour |
+| **Browser Cache** | Static assets | 1 year |
+| | API responses (GET) | 30 seconds |
+
+### Database Optimization
+
+**Appwrite Indexing:**
+- Index `userId` on characters collection
+- Index `status` on game_sessions collection
+- Index `listing_type` + `status` on marketplace_listings
+- Index `created_at` for time-based queries
+
+### AI Call Optimization
+
+**Strategies:**
+
+| Strategy | Impact | Implementation |
+|----------|--------|----------------|
+| **Batch requests** | Reduce API calls | Combine multiple actions |
+| **Cache common responses** | Reduce cost | Cache item descriptions |
+| **Prompt optimization** | Reduce tokens | Shorter, more efficient prompts |
+| **Model selection** | Reduce cost | Use cheaper models when appropriate |
+
+---
+
+## Troubleshooting
+
+### Common Issues
+
+| Issue | Symptoms | Solution |
+|-------|----------|----------|
+| **RQ workers not processing** | Jobs stuck in queue | Check Redis connection, restart workers |
+| **AI calls failing** | 401/403 errors | Verify API keys, check rate limits |
+| **Appwrite connection errors** | Database errors | Check Appwrite status, verify credentials |
+| **Session not updating** | Stale data in UI | Check Appwrite Realtime connection |
+| **High latency** | Slow API responses | Check RQ queue length, scale workers |
+
+### Debug Mode
+
+**Enable Debug Logging:**
+
+```bash
+export LOG_LEVEL=DEBUG
+flask run --debug
+```
+
+**Debug Endpoints (development only):**
+- `GET /debug/health` - Health check
+- `GET /debug/redis` - Redis connection status
+- `GET /debug/queues` - RQ queue status
+
+---
+
+## Resources
+
+| Resource | URL |
+|----------|-----|
+| **Appwrite Docs** | https://appwrite.io/docs |
+| **RQ Docs** | https://python-rq.org/ |
+| **Flask Docs** | https://flask.palletsprojects.com/ |
+| **Structlog Docs** | https://www.structlog.org/ |
+| **HTMX Docs** | https://htmx.org/docs/ |
+| **Anthropic API** | https://docs.anthropic.com/ |
+| **Replicate API** | https://replicate.com/docs |