581 lines
15 KiB
Markdown
581 lines
15 KiB
Markdown
# Deployment & Operations
|
||
|
||
## Local Development Setup
|
||
|
||
### Prerequisites
|
||
|
||
| Tool | Version | Purpose |
|
||
|------|---------|---------|
|
||
| Python | 3.11+ | Backend runtime |
|
||
| Docker | Latest | Local services |
|
||
| Redis | 7.0+ | Job queue & caching |
|
||
| Git | Latest | Version control |
|
||
|
||
### Setup Steps
|
||
|
||
```bash
|
||
# 1. Clone repository
|
||
git clone <repo-url>
|
||
cd code_of_conquest
|
||
|
||
# 2. Create virtual environment
|
||
python3 -m venv venv
|
||
source venv/bin/activate # Windows: venv\Scripts\activate
|
||
|
||
# 3. Install dependencies
|
||
pip install -r requirements.txt
|
||
|
||
# 4. Configure environment
|
||
cp .env.example .env
|
||
# Edit .env with your API keys and settings
|
||
|
||
# 5. Start local services
|
||
docker-compose up -d
|
||
|
||
# 6. Start RQ workers
|
||
rq worker ai_tasks combat_tasks marketplace_tasks &
|
||
|
||
# 7. Run Flask development server
|
||
flask run --debug
|
||
```
|
||
|
||
### Environment Variables
|
||
|
||
| Variable | Description | Required |
|
||
|----------|-------------|----------|
|
||
| `FLASK_ENV` | development/production | Yes |
|
||
| `SECRET_KEY` | Flask secret key | Yes |
|
||
| `REPLICATE_API_KEY` | Replicate API key | Yes |
|
||
| `ANTHROPIC_API_KEY` | Anthropic API key | Yes |
|
||
| `APPWRITE_ENDPOINT` | Appwrite server URL | Yes |
|
||
| `APPWRITE_PROJECT_ID` | Appwrite project ID | Yes |
|
||
| `APPWRITE_API_KEY` | Appwrite API key | Yes |
|
||
| `REDIS_URL` | Redis connection URL | Yes |
|
||
| `LOG_LEVEL` | Logging level (DEBUG/INFO/WARNING/ERROR) | No |
|
||
|
||
---
|
||
|
||
## Docker Compose (Local Development)
|
||
|
||
**docker-compose.yml:**
|
||
|
||
```yaml
|
||
version: '3.8'
|
||
services:
|
||
redis:
|
||
image: redis:alpine
|
||
ports:
|
||
- "6379:6379"
|
||
volumes:
|
||
- redis_data:/data
|
||
|
||
rq-worker:
|
||
build: .
|
||
command: rq worker ai_tasks combat_tasks marketplace_tasks --url redis://redis:6379
|
||
depends_on:
|
||
- redis
|
||
env_file:
|
||
- .env
|
||
environment:
|
||
- REDIS_URL=redis://redis:6379
|
||
|
||
volumes:
|
||
redis_data:
|
||
```
|
||
|
||
---
|
||
|
||
## Testing Strategy
|
||
|
||
### Manual Testing (Preferred)
|
||
|
||
**API Testing Document:** `docs/API_TESTING.md`
|
||
|
||
Contains:
|
||
- Endpoint examples
|
||
- Sample curl/httpie commands
|
||
- Expected responses
|
||
- Authentication setup
|
||
|
||
**Example API Test:**
|
||
|
||
```bash
|
||
# Login
|
||
curl -X POST http://localhost:5000/api/v1/auth/login \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"email": "test@example.com", "password": "password123"}'
|
||
|
||
# Create character (with auth token)
|
||
curl -X POST http://localhost:5000/api/v1/characters \
|
||
-H "Content-Type: application/json" \
|
||
-H "Authorization: Bearer <token>" \
|
||
-d '{"name": "Aragorn", "class_id": "vanguard"}'
|
||
```
|
||
|
||
### Unit Tests (Optional)
|
||
|
||
**Framework:** pytest
|
||
|
||
**Test Categories:**
|
||
|
||
| Category | Location | Focus |
|
||
|----------|----------|-------|
|
||
| Combat | `tests/test_combat.py` | Damage calculations, effect processing |
|
||
| Skills | `tests/test_skills.py` | Skill unlock logic, prerequisites |
|
||
| Marketplace | `tests/test_marketplace.py` | Bidding logic, auction processing |
|
||
| Character | `tests/test_character.py` | Character creation, stats |
|
||
|
||
**Run Tests:**
|
||
```bash
|
||
# All tests
|
||
pytest
|
||
|
||
# Specific test file
|
||
pytest tests/test_combat.py
|
||
|
||
# With coverage
|
||
pytest --cov=app tests/
|
||
```
|
||
|
||
### Load Testing
|
||
|
||
**Tool:** Locust or Apache Bench
|
||
|
||
**Test Scenarios:**
|
||
|
||
| Scenario | Target | Success Criteria |
|
||
|----------|--------|------------------|
|
||
| Concurrent AI requests | 50 concurrent users | < 5s response time |
|
||
| Marketplace browsing | 100 concurrent users | < 1s response time |
|
||
| Session realtime updates | 10 players per session | < 100ms update latency |
|
||
|
||
---
|
||
|
||
## Production Deployment
|
||
|
||
### Deployment Checklist
|
||
|
||
**Pre-Deployment:**
|
||
- [ ] All environment variables configured
|
||
- [ ] Appwrite collections created with proper permissions
|
||
- [ ] Redis configured and accessible
|
||
- [ ] RQ workers running
|
||
- [ ] SSL certificates installed
|
||
- [ ] Rate limiting configured
|
||
- [ ] Error logging/monitoring set up (Sentry recommended)
|
||
- [ ] Backup strategy for Appwrite data
|
||
|
||
**Production Configuration:**
|
||
- [ ] `DEBUG = False` in Flask
|
||
- [ ] Secure session keys (random, long)
|
||
- [ ] CORS restricted to production domain
|
||
- [ ] Rate limits appropriate for production
|
||
- [ ] AI cost alerts configured
|
||
- [ ] CDN for static assets (optional)
|
||
|
||
### Dockerfile
|
||
|
||
```dockerfile
|
||
FROM python:3.11-slim
|
||
|
||
WORKDIR /app
|
||
|
||
# Install dependencies
|
||
COPY requirements.txt .
|
||
RUN pip install --no-cache-dir -r requirements.txt
|
||
|
||
# Copy application
|
||
COPY . .
|
||
|
||
# Create non-root user
|
||
RUN useradd -m appuser && chown -R appuser:appuser /app
|
||
USER appuser
|
||
|
||
# Expose port
|
||
EXPOSE 5000
|
||
|
||
# Run application
|
||
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--workers", "4", "wsgi:app"]
|
||
```
|
||
|
||
### Build & Push Script
|
||
|
||
**scripts/build_and_push.sh:**
|
||
|
||
```bash
|
||
#!/bin/bash
|
||
|
||
# Get current git branch
|
||
BRANCH=$(git rev-parse --abbrev-ref HEAD)
|
||
|
||
# Ask for tag options
|
||
read -p "Tag as :latest? (y/n) " TAG_LATEST
|
||
read -p "Push to registry? (y/n) " PUSH_IMAGE
|
||
|
||
# Build image
|
||
docker build -t ai-dungeon-master:$BRANCH .
|
||
|
||
if [ "$TAG_LATEST" = "y" ]; then
|
||
docker tag ai-dungeon-master:$BRANCH ai-dungeon-master:latest
|
||
fi
|
||
|
||
if [ "$PUSH_IMAGE" = "y" ]; then
|
||
docker push ai-dungeon-master:$BRANCH
|
||
if [ "$TAG_LATEST" = "y" ]; then
|
||
docker push ai-dungeon-master:latest
|
||
fi
|
||
fi
|
||
```
|
||
|
||
### Production Environment
|
||
|
||
**Recommended Stack:**
|
||
- **Web Server:** Nginx (reverse proxy)
|
||
- **WSGI Server:** Gunicorn (4+ workers)
|
||
- **Process Manager:** Supervisor or systemd
|
||
- **Redis:** Standalone or Redis Cluster
|
||
- **RQ Workers:** Separate instances for each queue
|
||
|
||
**Scaling Strategy:**
|
||
|
||
| Component | Scaling Method | Trigger |
|
||
|-----------|----------------|---------|
|
||
| Flask API | Horizontal (add workers) | CPU > 70% |
|
||
| RQ Workers | Horizontal (add workers) | Queue length > 100 |
|
||
| Redis | Vertical (upgrade instance) | Memory > 80% |
|
||
| Appwrite | Managed by Appwrite | N/A |
|
||
|
||
---
|
||
|
||
## Monitoring & Logging
|
||
|
||
### Application Logging
|
||
|
||
**Logging Configuration:**
|
||
|
||
| Level | Use Case | Examples |
|
||
|-------|----------|----------|
|
||
| DEBUG | Development only | Variable values, function calls |
|
||
| INFO | Normal operations | User actions, API calls |
|
||
| WARNING | Potential issues | Rate limit approaching, slow queries |
|
||
| ERROR | Errors (recoverable) | Failed AI calls, validation errors |
|
||
| CRITICAL | Critical failures | Database connection lost, service down |
|
||
|
||
**Structured Logging with Structlog:**
|
||
|
||
```python
|
||
import structlog
|
||
|
||
logger = structlog.get_logger(__name__)
|
||
|
||
logger.info("Combat action executed",
|
||
session_id=session_id,
|
||
character_id=character_id,
|
||
action_type="attack",
|
||
damage=15
|
||
)
|
||
```
|
||
|
||
### Monitoring Tools
|
||
|
||
**Recommended Tools:**
|
||
|
||
| Tool | Purpose | Priority |
|
||
|------|---------|----------|
|
||
| **Sentry** | Error tracking and alerting | High |
|
||
| **Prometheus** | Metrics collection | Medium |
|
||
| **Grafana** | Metrics visualization | Medium |
|
||
| **Uptime Robot** | Uptime monitoring | High |
|
||
| **CloudWatch** | AWS logs/metrics (if using AWS) | Medium |
|
||
|
||
### Key Metrics to Monitor
|
||
|
||
| Metric | Alert Threshold | Action |
|
||
|--------|----------------|--------|
|
||
| API response time | > 3s average | Scale workers |
|
||
| Error rate | > 5% | Investigate logs |
|
||
| AI API errors | > 10% | Check API status |
|
||
| Queue length | > 500 | Add workers |
|
||
| Redis memory | > 80% | Upgrade instance |
|
||
| CPU usage | > 80% | Scale horizontally |
|
||
| AI cost per day | > budget × 1.2 | Investigate usage |
|
||
|
||
### AI Cost Tracking
|
||
|
||
**Log Structure:**
|
||
|
||
| Field | Type | Purpose |
|
||
|-------|------|---------|
|
||
| `user_id` | str | Track per-user usage |
|
||
| `model` | str | Which model used |
|
||
| `tier` | str | FREE/STANDARD/PREMIUM |
|
||
| `tokens_used` | int | Token count |
|
||
| `cost_estimate` | float | Estimated cost |
|
||
| `timestamp` | datetime | When called |
|
||
| `context_type` | str | What prompted the call |
|
||
|
||
**Daily Report:**
|
||
- Total AI calls per tier
|
||
- Total tokens used
|
||
- Estimated cost
|
||
- Top users by usage
|
||
- Anomaly detection (unusual spikes)
|
||
|
||
---
|
||
|
||
## Security
|
||
|
||
### Authentication & Authorization
|
||
|
||
**Implementation:**
|
||
|
||
| Layer | Method | Details |
|
||
|-------|--------|---------|
|
||
| **User Auth** | Appwrite Auth | Email/password, OAuth providers |
|
||
| **API Auth** | JWT tokens | Bearer token in Authorization header |
|
||
| **Session Validation** | Every API call | Verify token, check expiry |
|
||
| **Resource Access** | User ID check | Users can only access their own data |
|
||
|
||
### Input Validation
|
||
|
||
**Validation Strategy:**
|
||
|
||
| Input Type | Validation | Tools |
|
||
|------------|------------|-------|
|
||
| JSON payloads | Schema validation | Marshmallow or Pydantic |
|
||
| Character names | Sanitize, length limits | Bleach library |
|
||
| Chat messages | Sanitize, profanity filter | Custom validators |
|
||
| AI prompts | Template-based only | Jinja2 (no direct user input) |
|
||
|
||
**Example Validation:**
|
||
|
||
| Field | Rules |
|
||
|-------|-------|
|
||
| Character name | 3-20 chars, alphanumeric + spaces only |
|
||
| Gold amount | Positive integer, max 999,999,999 |
|
||
| Action text | Max 500 chars, sanitized HTML |
|
||
|
||
### Rate Limiting
|
||
|
||
**Implementation:** Flask-Limiter with Redis backend
|
||
|
||
**Limits by Tier:**
|
||
|
||
| Tier | API Calls/Min | AI Calls/Day | Marketplace Actions/Day |
|
||
|------|---------------|--------------|------------------------|
|
||
| FREE | 30 | 50 | N/A |
|
||
| BASIC | 60 | 200 | N/A |
|
||
| PREMIUM | 120 | 1000 | 50 |
|
||
| ELITE | 300 | Unlimited | 100 |
|
||
|
||
**Rate Limit Bypass:**
|
||
- Admin accounts
|
||
- Health check endpoints
|
||
- Static assets
|
||
|
||
### API Security
|
||
|
||
**Configuration:**
|
||
|
||
| Setting | Value | Reason |
|
||
|---------|-------|--------|
|
||
| **CORS** | Production domain only | Prevent unauthorized access |
|
||
| **HTTPS** | Required | Encrypt data in transit |
|
||
| **API Keys** | Environment variables | Never in code |
|
||
| **Appwrite Permissions** | Least privilege | Collection-level security |
|
||
| **SQL Injection** | N/A | Using Appwrite (NoSQL) |
|
||
| **XSS** | Sanitize all inputs | Prevent script injection |
|
||
| **CSRF** | CSRF tokens | For form submissions |
|
||
|
||
### Data Protection
|
||
|
||
**Access Control Matrix:**
|
||
|
||
| Resource | Owner | Party Member | Public | System |
|
||
|----------|-------|--------------|--------|--------|
|
||
| Characters | RW | R | - | RW |
|
||
| Sessions | R | RW (turn) | - | RW |
|
||
| Marketplace Listings | RW (own) | - | R | RW |
|
||
| Transactions | R (own) | - | - | RW |
|
||
|
||
**RW = Read/Write, R = Read only, - = No access**
|
||
|
||
### Secrets Management
|
||
|
||
**Never Commit:**
|
||
- API keys
|
||
- Database credentials
|
||
- Secret keys
|
||
- Tokens
|
||
|
||
**Best Practices:**
|
||
- Use `.env` for local development
|
||
- Use environment variables in production
|
||
- Use secrets manager (AWS Secrets Manager, HashiCorp Vault) in production
|
||
- Rotate keys regularly
|
||
- Different keys for dev/staging/prod
|
||
|
||
---
|
||
|
||
## Backup & Recovery
|
||
|
||
### Appwrite Data Backup
|
||
|
||
**Strategy:**
|
||
|
||
| Data Type | Backup Frequency | Retention | Method |
|
||
|-----------|------------------|-----------|--------|
|
||
| Characters | Daily | 30 days | Appwrite export |
|
||
| Sessions (active) | Hourly | 7 days | Appwrite export |
|
||
| Marketplace | Daily | 30 days | Appwrite export |
|
||
| Transactions | Daily | 90 days | Appwrite export |
|
||
|
||
**Backup Script:**
|
||
- Export collections to JSON
|
||
- Compress and encrypt
|
||
- Upload to S3 or object storage
|
||
- Verify backup integrity
|
||
|
||
### Disaster Recovery Plan
|
||
|
||
| Scenario | RTO | RPO | Steps |
|
||
|----------|-----|-----|-------|
|
||
| **Database corruption** | 4 hours | 24 hours | Restore from latest backup |
|
||
| **API server down** | 15 minutes | 0 | Restart/failover to standby |
|
||
| **Redis failure** | 5 minutes | Session data loss | Restart, users re-login |
|
||
| **Complete infrastructure loss** | 24 hours | 24 hours | Restore from backups to new infrastructure |
|
||
|
||
**RTO = Recovery Time Objective, RPO = Recovery Point Objective**
|
||
|
||
---
|
||
|
||
## CI/CD Pipeline
|
||
|
||
### Recommended Workflow
|
||
|
||
| Stage | Actions | Tools |
|
||
|-------|---------|-------|
|
||
| **1. Commit** | Developer pushes to `dev` branch | Git |
|
||
| **2. Build** | Run tests, lint code | GitHub Actions, pytest, flake8 |
|
||
| **3. Test** | Unit tests, integration tests | pytest |
|
||
| **4. Build Image** | Create Docker image | Docker |
|
||
| **5. Deploy to Staging** | Deploy to staging environment | Docker, SSH |
|
||
| **6. Manual Test** | QA testing on staging | Manual |
|
||
| **7. Merge to Beta** | Promote to beta branch | Git |
|
||
| **8. Deploy to Beta** | Deploy to beta environment | Docker, SSH |
|
||
| **9. Merge to Master** | Production promotion | Git |
|
||
| **10. Deploy to Prod** | Deploy to production | Docker, SSH |
|
||
| **11. Tag Release** | Create version tag | Git |
|
||
|
||
### GitHub Actions Example
|
||
|
||
```yaml
|
||
name: CI/CD
|
||
|
||
on:
|
||
push:
|
||
branches: [ dev, beta, master ]
|
||
|
||
jobs:
|
||
test:
|
||
runs-on: ubuntu-latest
|
||
steps:
|
||
- uses: actions/checkout@v2
|
||
- name: Set up Python
|
||
uses: actions/setup-python@v2
|
||
with:
|
||
python-version: 3.11
|
||
- name: Install dependencies
|
||
run: pip install -r requirements.txt
|
||
- name: Run tests
|
||
run: pytest
|
||
- name: Lint
|
||
run: flake8 app/
|
||
|
||
build:
|
||
needs: test
|
||
runs-on: ubuntu-latest
|
||
steps:
|
||
- uses: actions/checkout@v2
|
||
- name: Build Docker image
|
||
run: docker build -t ai-dungeon-master:${{ github.ref_name }} .
|
||
- name: Push to registry
|
||
run: docker push ai-dungeon-master:${{ github.ref_name }}
|
||
```
|
||
|
||
---
|
||
|
||
## Performance Optimization
|
||
|
||
### Caching Strategy
|
||
|
||
| Cache Type | What to Cache | TTL |
|
||
|------------|---------------|-----|
|
||
| **Redis Cache** | Session data | 30 minutes |
|
||
| | Character data (read-heavy) | 5 minutes |
|
||
| | Marketplace listings | 1 minute |
|
||
| | NPC shop items | 1 hour |
|
||
| **Browser Cache** | Static assets | 1 year |
|
||
| | API responses (GET) | 30 seconds |
|
||
|
||
### Database Optimization
|
||
|
||
**Appwrite Indexing:**
|
||
- Index `userId` on characters collection
|
||
- Index `status` on game_sessions collection
|
||
- Index `listing_type` + `status` on marketplace_listings
|
||
- Index `created_at` for time-based queries
|
||
|
||
### AI Call Optimization
|
||
|
||
**Strategies:**
|
||
|
||
| Strategy | Impact | Implementation |
|
||
|----------|--------|----------------|
|
||
| **Batch requests** | Reduce API calls | Combine multiple actions |
|
||
| **Cache common responses** | Reduce cost | Cache item descriptions |
|
||
| **Prompt optimization** | Reduce tokens | Shorter, more efficient prompts |
|
||
| **Model selection** | Reduce cost | Use cheaper models when appropriate |
|
||
|
||
---
|
||
|
||
## Troubleshooting
|
||
|
||
### Common Issues
|
||
|
||
| Issue | Symptoms | Solution |
|
||
|-------|----------|----------|
|
||
| **RQ workers not processing** | Jobs stuck in queue | Check Redis connection, restart workers |
|
||
| **AI calls failing** | 401/403 errors | Verify API keys, check rate limits |
|
||
| **Appwrite connection errors** | Database errors | Check Appwrite status, verify credentials |
|
||
| **Session not updating** | Stale data in UI | Check Appwrite Realtime connection |
|
||
| **High latency** | Slow API responses | Check RQ queue length, scale workers |
|
||
|
||
### Debug Mode
|
||
|
||
**Enable Debug Logging:**
|
||
|
||
```bash
|
||
export LOG_LEVEL=DEBUG
|
||
flask run --debug
|
||
```
|
||
|
||
**Debug Endpoints (development only):**
|
||
- `GET /debug/health` - Health check
|
||
- `GET /debug/redis` - Redis connection status
|
||
- `GET /debug/queues` - RQ queue status
|
||
|
||
---
|
||
|
||
## Resources
|
||
|
||
| Resource | URL |
|
||
|----------|-----|
|
||
| **Appwrite Docs** | https://appwrite.io/docs |
|
||
| **RQ Docs** | https://python-rq.org/ |
|
||
| **Flask Docs** | https://flask.palletsprojects.com/ |
|
||
| **Structlog Docs** | https://www.structlog.org/ |
|
||
| **HTMX Docs** | https://htmx.org/docs/ |
|
||
| **Anthropic API** | https://docs.anthropic.com/ |
|
||
| **Replicate API** | https://replicate.com/docs |
|