Files
Code_of_Conquest/docs/DEPLOYMENT.md
2025-11-24 23:10:55 -06:00

581 lines
15 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Deployment & Operations
## Local Development Setup
### Prerequisites
| Tool | Version | Purpose |
|------|---------|---------|
| Python | 3.11+ | Backend runtime |
| Docker | Latest | Local services |
| Redis | 7.0+ | Job queue & caching |
| Git | Latest | Version control |
### Setup Steps
```bash
# 1. Clone repository
git clone <repo-url>
cd code_of_conquest
# 2. Create virtual environment
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Configure environment
cp .env.example .env
# Edit .env with your API keys and settings
# 5. Start local services
docker-compose up -d
# 6. Start RQ workers
rq worker ai_tasks combat_tasks marketplace_tasks &
# 7. Run Flask development server
flask run --debug
```
### Environment Variables
| Variable | Description | Required |
|----------|-------------|----------|
| `FLASK_ENV` | development/production | Yes |
| `SECRET_KEY` | Flask secret key | Yes |
| `REPLICATE_API_KEY` | Replicate API key | Yes |
| `ANTHROPIC_API_KEY` | Anthropic API key | Yes |
| `APPWRITE_ENDPOINT` | Appwrite server URL | Yes |
| `APPWRITE_PROJECT_ID` | Appwrite project ID | Yes |
| `APPWRITE_API_KEY` | Appwrite API key | Yes |
| `REDIS_URL` | Redis connection URL | Yes |
| `LOG_LEVEL` | Logging level (DEBUG/INFO/WARNING/ERROR) | No |
---
## Docker Compose (Local Development)
**docker-compose.yml:**
```yaml
version: '3.8'
services:
redis:
image: redis:alpine
ports:
- "6379:6379"
volumes:
- redis_data:/data
rq-worker:
build: .
command: rq worker ai_tasks combat_tasks marketplace_tasks --url redis://redis:6379
depends_on:
- redis
env_file:
- .env
environment:
- REDIS_URL=redis://redis:6379
volumes:
redis_data:
```
---
## Testing Strategy
### Manual Testing (Preferred)
**API Testing Document:** `docs/API_TESTING.md`
Contains:
- Endpoint examples
- Sample curl/httpie commands
- Expected responses
- Authentication setup
**Example API Test:**
```bash
# Login
curl -X POST http://localhost:5000/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"email": "test@example.com", "password": "password123"}'
# Create character (with auth token)
curl -X POST http://localhost:5000/api/v1/characters \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{"name": "Aragorn", "class_id": "vanguard"}'
```
### Unit Tests (Optional)
**Framework:** pytest
**Test Categories:**
| Category | Location | Focus |
|----------|----------|-------|
| Combat | `tests/test_combat.py` | Damage calculations, effect processing |
| Skills | `tests/test_skills.py` | Skill unlock logic, prerequisites |
| Marketplace | `tests/test_marketplace.py` | Bidding logic, auction processing |
| Character | `tests/test_character.py` | Character creation, stats |
**Run Tests:**
```bash
# All tests
pytest
# Specific test file
pytest tests/test_combat.py
# With coverage
pytest --cov=app tests/
```
### Load Testing
**Tool:** Locust or Apache Bench
**Test Scenarios:**
| Scenario | Target | Success Criteria |
|----------|--------|------------------|
| Concurrent AI requests | 50 concurrent users | < 5s response time |
| Marketplace browsing | 100 concurrent users | < 1s response time |
| Session realtime updates | 10 players per session | < 100ms update latency |
---
## Production Deployment
### Deployment Checklist
**Pre-Deployment:**
- [ ] All environment variables configured
- [ ] Appwrite collections created with proper permissions
- [ ] Redis configured and accessible
- [ ] RQ workers running
- [ ] SSL certificates installed
- [ ] Rate limiting configured
- [ ] Error logging/monitoring set up (Sentry recommended)
- [ ] Backup strategy for Appwrite data
**Production Configuration:**
- [ ] `DEBUG = False` in Flask
- [ ] Secure session keys (random, long)
- [ ] CORS restricted to production domain
- [ ] Rate limits appropriate for production
- [ ] AI cost alerts configured
- [ ] CDN for static assets (optional)
### Dockerfile
```dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY . .
# Create non-root user
RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser
# Expose port
EXPOSE 5000
# Run application
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--workers", "4", "wsgi:app"]
```
### Build & Push Script
**scripts/build_and_push.sh:**
```bash
#!/bin/bash
# Get current git branch
BRANCH=$(git rev-parse --abbrev-ref HEAD)
# Ask for tag options
read -p "Tag as :latest? (y/n) " TAG_LATEST
read -p "Push to registry? (y/n) " PUSH_IMAGE
# Build image
docker build -t ai-dungeon-master:$BRANCH .
if [ "$TAG_LATEST" = "y" ]; then
docker tag ai-dungeon-master:$BRANCH ai-dungeon-master:latest
fi
if [ "$PUSH_IMAGE" = "y" ]; then
docker push ai-dungeon-master:$BRANCH
if [ "$TAG_LATEST" = "y" ]; then
docker push ai-dungeon-master:latest
fi
fi
```
### Production Environment
**Recommended Stack:**
- **Web Server:** Nginx (reverse proxy)
- **WSGI Server:** Gunicorn (4+ workers)
- **Process Manager:** Supervisor or systemd
- **Redis:** Standalone or Redis Cluster
- **RQ Workers:** Separate instances for each queue
**Scaling Strategy:**
| Component | Scaling Method | Trigger |
|-----------|----------------|---------|
| Flask API | Horizontal (add workers) | CPU > 70% |
| RQ Workers | Horizontal (add workers) | Queue length > 100 |
| Redis | Vertical (upgrade instance) | Memory > 80% |
| Appwrite | Managed by Appwrite | N/A |
---
## Monitoring & Logging
### Application Logging
**Logging Configuration:**
| Level | Use Case | Examples |
|-------|----------|----------|
| DEBUG | Development only | Variable values, function calls |
| INFO | Normal operations | User actions, API calls |
| WARNING | Potential issues | Rate limit approaching, slow queries |
| ERROR | Errors (recoverable) | Failed AI calls, validation errors |
| CRITICAL | Critical failures | Database connection lost, service down |
**Structured Logging with Structlog:**
```python
import structlog
logger = structlog.get_logger(__name__)
logger.info("Combat action executed",
session_id=session_id,
character_id=character_id,
action_type="attack",
damage=15
)
```
### Monitoring Tools
**Recommended Tools:**
| Tool | Purpose | Priority |
|------|---------|----------|
| **Sentry** | Error tracking and alerting | High |
| **Prometheus** | Metrics collection | Medium |
| **Grafana** | Metrics visualization | Medium |
| **Uptime Robot** | Uptime monitoring | High |
| **CloudWatch** | AWS logs/metrics (if using AWS) | Medium |
### Key Metrics to Monitor
| Metric | Alert Threshold | Action |
|--------|----------------|--------|
| API response time | > 3s average | Scale workers |
| Error rate | > 5% | Investigate logs |
| AI API errors | > 10% | Check API status |
| Queue length | > 500 | Add workers |
| Redis memory | > 80% | Upgrade instance |
| CPU usage | > 80% | Scale horizontally |
| AI cost per day | > budget × 1.2 | Investigate usage |
### AI Cost Tracking
**Log Structure:**
| Field | Type | Purpose |
|-------|------|---------|
| `user_id` | str | Track per-user usage |
| `model` | str | Which model used |
| `tier` | str | FREE/STANDARD/PREMIUM |
| `tokens_used` | int | Token count |
| `cost_estimate` | float | Estimated cost |
| `timestamp` | datetime | When called |
| `context_type` | str | What prompted the call |
**Daily Report:**
- Total AI calls per tier
- Total tokens used
- Estimated cost
- Top users by usage
- Anomaly detection (unusual spikes)
---
## Security
### Authentication & Authorization
**Implementation:**
| Layer | Method | Details |
|-------|--------|---------|
| **User Auth** | Appwrite Auth | Email/password, OAuth providers |
| **API Auth** | JWT tokens | Bearer token in Authorization header |
| **Session Validation** | Every API call | Verify token, check expiry |
| **Resource Access** | User ID check | Users can only access their own data |
### Input Validation
**Validation Strategy:**
| Input Type | Validation | Tools |
|------------|------------|-------|
| JSON payloads | Schema validation | Marshmallow or Pydantic |
| Character names | Sanitize, length limits | Bleach library |
| Chat messages | Sanitize, profanity filter | Custom validators |
| AI prompts | Template-based only | Jinja2 (no direct user input) |
**Example Validation:**
| Field | Rules |
|-------|-------|
| Character name | 3-20 chars, alphanumeric + spaces only |
| Gold amount | Positive integer, max 999,999,999 |
| Action text | Max 500 chars, sanitized HTML |
### Rate Limiting
**Implementation:** Flask-Limiter with Redis backend
**Limits by Tier:**
| Tier | API Calls/Min | AI Calls/Day | Marketplace Actions/Day |
|------|---------------|--------------|------------------------|
| FREE | 30 | 50 | N/A |
| BASIC | 60 | 200 | N/A |
| PREMIUM | 120 | 1000 | 50 |
| ELITE | 300 | Unlimited | 100 |
**Rate Limit Bypass:**
- Admin accounts
- Health check endpoints
- Static assets
### API Security
**Configuration:**
| Setting | Value | Reason |
|---------|-------|--------|
| **CORS** | Production domain only | Prevent unauthorized access |
| **HTTPS** | Required | Encrypt data in transit |
| **API Keys** | Environment variables | Never in code |
| **Appwrite Permissions** | Least privilege | Collection-level security |
| **SQL Injection** | N/A | Using Appwrite (NoSQL) |
| **XSS** | Sanitize all inputs | Prevent script injection |
| **CSRF** | CSRF tokens | For form submissions |
### Data Protection
**Access Control Matrix:**
| Resource | Owner | Party Member | Public | System |
|----------|-------|--------------|--------|--------|
| Characters | RW | R | - | RW |
| Sessions | R | RW (turn) | - | RW |
| Marketplace Listings | RW (own) | - | R | RW |
| Transactions | R (own) | - | - | RW |
**RW = Read/Write, R = Read only, - = No access**
### Secrets Management
**Never Commit:**
- API keys
- Database credentials
- Secret keys
- Tokens
**Best Practices:**
- Use `.env` for local development
- Use environment variables in production
- Use secrets manager (AWS Secrets Manager, HashiCorp Vault) in production
- Rotate keys regularly
- Different keys for dev/staging/prod
---
## Backup & Recovery
### Appwrite Data Backup
**Strategy:**
| Data Type | Backup Frequency | Retention | Method |
|-----------|------------------|-----------|--------|
| Characters | Daily | 30 days | Appwrite export |
| Sessions (active) | Hourly | 7 days | Appwrite export |
| Marketplace | Daily | 30 days | Appwrite export |
| Transactions | Daily | 90 days | Appwrite export |
**Backup Script:**
- Export collections to JSON
- Compress and encrypt
- Upload to S3 or object storage
- Verify backup integrity
### Disaster Recovery Plan
| Scenario | RTO | RPO | Steps |
|----------|-----|-----|-------|
| **Database corruption** | 4 hours | 24 hours | Restore from latest backup |
| **API server down** | 15 minutes | 0 | Restart/failover to standby |
| **Redis failure** | 5 minutes | Session data loss | Restart, users re-login |
| **Complete infrastructure loss** | 24 hours | 24 hours | Restore from backups to new infrastructure |
**RTO = Recovery Time Objective, RPO = Recovery Point Objective**
---
## CI/CD Pipeline
### Recommended Workflow
| Stage | Actions | Tools |
|-------|---------|-------|
| **1. Commit** | Developer pushes to `dev` branch | Git |
| **2. Build** | Run tests, lint code | GitHub Actions, pytest, flake8 |
| **3. Test** | Unit tests, integration tests | pytest |
| **4. Build Image** | Create Docker image | Docker |
| **5. Deploy to Staging** | Deploy to staging environment | Docker, SSH |
| **6. Manual Test** | QA testing on staging | Manual |
| **7. Merge to Beta** | Promote to beta branch | Git |
| **8. Deploy to Beta** | Deploy to beta environment | Docker, SSH |
| **9. Merge to Master** | Production promotion | Git |
| **10. Deploy to Prod** | Deploy to production | Docker, SSH |
| **11. Tag Release** | Create version tag | Git |
### GitHub Actions Example
```yaml
name: CI/CD
on:
push:
branches: [ dev, beta, master ]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.11
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run tests
run: pytest
- name: Lint
run: flake8 app/
build:
needs: test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Build Docker image
run: docker build -t ai-dungeon-master:${{ github.ref_name }} .
- name: Push to registry
run: docker push ai-dungeon-master:${{ github.ref_name }}
```
---
## Performance Optimization
### Caching Strategy
| Cache Type | What to Cache | TTL |
|------------|---------------|-----|
| **Redis Cache** | Session data | 30 minutes |
| | Character data (read-heavy) | 5 minutes |
| | Marketplace listings | 1 minute |
| | NPC shop items | 1 hour |
| **Browser Cache** | Static assets | 1 year |
| | API responses (GET) | 30 seconds |
### Database Optimization
**Appwrite Indexing:**
- Index `userId` on characters collection
- Index `status` on game_sessions collection
- Index `listing_type` + `status` on marketplace_listings
- Index `created_at` for time-based queries
### AI Call Optimization
**Strategies:**
| Strategy | Impact | Implementation |
|----------|--------|----------------|
| **Batch requests** | Reduce API calls | Combine multiple actions |
| **Cache common responses** | Reduce cost | Cache item descriptions |
| **Prompt optimization** | Reduce tokens | Shorter, more efficient prompts |
| **Model selection** | Reduce cost | Use cheaper models when appropriate |
---
## Troubleshooting
### Common Issues
| Issue | Symptoms | Solution |
|-------|----------|----------|
| **RQ workers not processing** | Jobs stuck in queue | Check Redis connection, restart workers |
| **AI calls failing** | 401/403 errors | Verify API keys, check rate limits |
| **Appwrite connection errors** | Database errors | Check Appwrite status, verify credentials |
| **Session not updating** | Stale data in UI | Check Appwrite Realtime connection |
| **High latency** | Slow API responses | Check RQ queue length, scale workers |
### Debug Mode
**Enable Debug Logging:**
```bash
export LOG_LEVEL=DEBUG
flask run --debug
```
**Debug Endpoints (development only):**
- `GET /debug/health` - Health check
- `GET /debug/redis` - Redis connection status
- `GET /debug/queues` - RQ queue status
---
## Resources
| Resource | URL |
|----------|-----|
| **Appwrite Docs** | https://appwrite.io/docs |
| **RQ Docs** | https://python-rq.org/ |
| **Flask Docs** | https://flask.palletsprojects.com/ |
| **Structlog Docs** | https://www.structlog.org/ |
| **HTMX Docs** | https://htmx.org/docs/ |
| **Anthropic API** | https://docs.anthropic.com/ |
| **Replicate API** | https://replicate.com/docs |