first commit
This commit is contained in:
580
docs/DEPLOYMENT.md
Normal file
580
docs/DEPLOYMENT.md
Normal file
@@ -0,0 +1,580 @@
|
||||
# Deployment & Operations
|
||||
|
||||
## Local Development Setup
|
||||
|
||||
### Prerequisites
|
||||
|
||||
| Tool | Version | Purpose |
|
||||
|------|---------|---------|
|
||||
| Python | 3.11+ | Backend runtime |
|
||||
| Docker | Latest | Local services |
|
||||
| Redis | 7.0+ | Job queue & caching |
|
||||
| Git | Latest | Version control |
|
||||
|
||||
### Setup Steps
|
||||
|
||||
```bash
|
||||
# 1. Clone repository
|
||||
git clone <repo-url>
|
||||
cd code_of_conquest
|
||||
|
||||
# 2. Create virtual environment
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate # Windows: venv\Scripts\activate
|
||||
|
||||
# 3. Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# 4. Configure environment
|
||||
cp .env.example .env
|
||||
# Edit .env with your API keys and settings
|
||||
|
||||
# 5. Start local services
|
||||
docker-compose up -d
|
||||
|
||||
# 6. Start RQ workers
|
||||
rq worker ai_tasks combat_tasks marketplace_tasks &
|
||||
|
||||
# 7. Run Flask development server
|
||||
flask run --debug
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
| Variable | Description | Required |
|
||||
|----------|-------------|----------|
|
||||
| `FLASK_ENV` | development/production | Yes |
|
||||
| `SECRET_KEY` | Flask secret key | Yes |
|
||||
| `REPLICATE_API_KEY` | Replicate API key | Yes |
|
||||
| `ANTHROPIC_API_KEY` | Anthropic API key | Yes |
|
||||
| `APPWRITE_ENDPOINT` | Appwrite server URL | Yes |
|
||||
| `APPWRITE_PROJECT_ID` | Appwrite project ID | Yes |
|
||||
| `APPWRITE_API_KEY` | Appwrite API key | Yes |
|
||||
| `REDIS_URL` | Redis connection URL | Yes |
|
||||
| `LOG_LEVEL` | Logging level (DEBUG/INFO/WARNING/ERROR) | No |
|
||||
|
||||
---
|
||||
|
||||
## Docker Compose (Local Development)
|
||||
|
||||
**docker-compose.yml:**
|
||||
|
||||
```yaml
|
||||
version: '3.8'
|
||||
services:
|
||||
redis:
|
||||
image: redis:alpine
|
||||
ports:
|
||||
- "6379:6379"
|
||||
volumes:
|
||||
- redis_data:/data
|
||||
|
||||
rq-worker:
|
||||
build: .
|
||||
command: rq worker ai_tasks combat_tasks marketplace_tasks --url redis://redis:6379
|
||||
depends_on:
|
||||
- redis
|
||||
env_file:
|
||||
- .env
|
||||
environment:
|
||||
- REDIS_URL=redis://redis:6379
|
||||
|
||||
volumes:
|
||||
redis_data:
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Manual Testing (Preferred)
|
||||
|
||||
**API Testing Document:** `docs/API_TESTING.md`
|
||||
|
||||
Contains:
|
||||
- Endpoint examples
|
||||
- Sample curl/httpie commands
|
||||
- Expected responses
|
||||
- Authentication setup
|
||||
|
||||
**Example API Test:**
|
||||
|
||||
```bash
|
||||
# Login
|
||||
curl -X POST http://localhost:5000/api/v1/auth/login \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"email": "test@example.com", "password": "password123"}'
|
||||
|
||||
# Create character (with auth token)
|
||||
curl -X POST http://localhost:5000/api/v1/characters \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer <token>" \
|
||||
-d '{"name": "Aragorn", "class_id": "vanguard"}'
|
||||
```
|
||||
|
||||
### Unit Tests (Optional)
|
||||
|
||||
**Framework:** pytest
|
||||
|
||||
**Test Categories:**
|
||||
|
||||
| Category | Location | Focus |
|
||||
|----------|----------|-------|
|
||||
| Combat | `tests/test_combat.py` | Damage calculations, effect processing |
|
||||
| Skills | `tests/test_skills.py` | Skill unlock logic, prerequisites |
|
||||
| Marketplace | `tests/test_marketplace.py` | Bidding logic, auction processing |
|
||||
| Character | `tests/test_character.py` | Character creation, stats |
|
||||
|
||||
**Run Tests:**
|
||||
```bash
|
||||
# All tests
|
||||
pytest
|
||||
|
||||
# Specific test file
|
||||
pytest tests/test_combat.py
|
||||
|
||||
# With coverage
|
||||
pytest --cov=app tests/
|
||||
```
|
||||
|
||||
### Load Testing
|
||||
|
||||
**Tool:** Locust or Apache Bench
|
||||
|
||||
**Test Scenarios:**
|
||||
|
||||
| Scenario | Target | Success Criteria |
|
||||
|----------|--------|------------------|
|
||||
| Concurrent AI requests | 50 concurrent users | < 5s response time |
|
||||
| Marketplace browsing | 100 concurrent users | < 1s response time |
|
||||
| Session realtime updates | 10 players per session | < 100ms update latency |
|
||||
|
||||
---
|
||||
|
||||
## Production Deployment
|
||||
|
||||
### Deployment Checklist
|
||||
|
||||
**Pre-Deployment:**
|
||||
- [ ] All environment variables configured
|
||||
- [ ] Appwrite collections created with proper permissions
|
||||
- [ ] Redis configured and accessible
|
||||
- [ ] RQ workers running
|
||||
- [ ] SSL certificates installed
|
||||
- [ ] Rate limiting configured
|
||||
- [ ] Error logging/monitoring set up (Sentry recommended)
|
||||
- [ ] Backup strategy for Appwrite data
|
||||
|
||||
**Production Configuration:**
|
||||
- [ ] `DEBUG = False` in Flask
|
||||
- [ ] Secure session keys (random, long)
|
||||
- [ ] CORS restricted to production domain
|
||||
- [ ] Rate limits appropriate for production
|
||||
- [ ] AI cost alerts configured
|
||||
- [ ] CDN for static assets (optional)
|
||||
|
||||
### Dockerfile
|
||||
|
||||
```dockerfile
|
||||
FROM python:3.11-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Install dependencies
|
||||
COPY requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
# Copy application
|
||||
COPY . .
|
||||
|
||||
# Create non-root user
|
||||
RUN useradd -m appuser && chown -R appuser:appuser /app
|
||||
USER appuser
|
||||
|
||||
# Expose port
|
||||
EXPOSE 5000
|
||||
|
||||
# Run application
|
||||
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--workers", "4", "wsgi:app"]
|
||||
```
|
||||
|
||||
### Build & Push Script
|
||||
|
||||
**scripts/build_and_push.sh:**
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
# Get current git branch
|
||||
BRANCH=$(git rev-parse --abbrev-ref HEAD)
|
||||
|
||||
# Ask for tag options
|
||||
read -p "Tag as :latest? (y/n) " TAG_LATEST
|
||||
read -p "Push to registry? (y/n) " PUSH_IMAGE
|
||||
|
||||
# Build image
|
||||
docker build -t ai-dungeon-master:$BRANCH .
|
||||
|
||||
if [ "$TAG_LATEST" = "y" ]; then
|
||||
docker tag ai-dungeon-master:$BRANCH ai-dungeon-master:latest
|
||||
fi
|
||||
|
||||
if [ "$PUSH_IMAGE" = "y" ]; then
|
||||
docker push ai-dungeon-master:$BRANCH
|
||||
if [ "$TAG_LATEST" = "y" ]; then
|
||||
docker push ai-dungeon-master:latest
|
||||
fi
|
||||
fi
|
||||
```
|
||||
|
||||
### Production Environment
|
||||
|
||||
**Recommended Stack:**
|
||||
- **Web Server:** Nginx (reverse proxy)
|
||||
- **WSGI Server:** Gunicorn (4+ workers)
|
||||
- **Process Manager:** Supervisor or systemd
|
||||
- **Redis:** Standalone or Redis Cluster
|
||||
- **RQ Workers:** Separate instances for each queue
|
||||
|
||||
**Scaling Strategy:**
|
||||
|
||||
| Component | Scaling Method | Trigger |
|
||||
|-----------|----------------|---------|
|
||||
| Flask API | Horizontal (add workers) | CPU > 70% |
|
||||
| RQ Workers | Horizontal (add workers) | Queue length > 100 |
|
||||
| Redis | Vertical (upgrade instance) | Memory > 80% |
|
||||
| Appwrite | Managed by Appwrite | N/A |
|
||||
|
||||
---
|
||||
|
||||
## Monitoring & Logging
|
||||
|
||||
### Application Logging
|
||||
|
||||
**Logging Configuration:**
|
||||
|
||||
| Level | Use Case | Examples |
|
||||
|-------|----------|----------|
|
||||
| DEBUG | Development only | Variable values, function calls |
|
||||
| INFO | Normal operations | User actions, API calls |
|
||||
| WARNING | Potential issues | Rate limit approaching, slow queries |
|
||||
| ERROR | Errors (recoverable) | Failed AI calls, validation errors |
|
||||
| CRITICAL | Critical failures | Database connection lost, service down |
|
||||
|
||||
**Structured Logging with Structlog:**
|
||||
|
||||
```python
|
||||
import structlog
|
||||
|
||||
logger = structlog.get_logger(__name__)
|
||||
|
||||
logger.info("Combat action executed",
|
||||
session_id=session_id,
|
||||
character_id=character_id,
|
||||
action_type="attack",
|
||||
damage=15
|
||||
)
|
||||
```
|
||||
|
||||
### Monitoring Tools
|
||||
|
||||
**Recommended Tools:**
|
||||
|
||||
| Tool | Purpose | Priority |
|
||||
|------|---------|----------|
|
||||
| **Sentry** | Error tracking and alerting | High |
|
||||
| **Prometheus** | Metrics collection | Medium |
|
||||
| **Grafana** | Metrics visualization | Medium |
|
||||
| **Uptime Robot** | Uptime monitoring | High |
|
||||
| **CloudWatch** | AWS logs/metrics (if using AWS) | Medium |
|
||||
|
||||
### Key Metrics to Monitor
|
||||
|
||||
| Metric | Alert Threshold | Action |
|
||||
|--------|----------------|--------|
|
||||
| API response time | > 3s average | Scale workers |
|
||||
| Error rate | > 5% | Investigate logs |
|
||||
| AI API errors | > 10% | Check API status |
|
||||
| Queue length | > 500 | Add workers |
|
||||
| Redis memory | > 80% | Upgrade instance |
|
||||
| CPU usage | > 80% | Scale horizontally |
|
||||
| AI cost per day | > budget × 1.2 | Investigate usage |
|
||||
|
||||
### AI Cost Tracking
|
||||
|
||||
**Log Structure:**
|
||||
|
||||
| Field | Type | Purpose |
|
||||
|-------|------|---------|
|
||||
| `user_id` | str | Track per-user usage |
|
||||
| `model` | str | Which model used |
|
||||
| `tier` | str | FREE/STANDARD/PREMIUM |
|
||||
| `tokens_used` | int | Token count |
|
||||
| `cost_estimate` | float | Estimated cost |
|
||||
| `timestamp` | datetime | When called |
|
||||
| `context_type` | str | What prompted the call |
|
||||
|
||||
**Daily Report:**
|
||||
- Total AI calls per tier
|
||||
- Total tokens used
|
||||
- Estimated cost
|
||||
- Top users by usage
|
||||
- Anomaly detection (unusual spikes)
|
||||
|
||||
---
|
||||
|
||||
## Security
|
||||
|
||||
### Authentication & Authorization
|
||||
|
||||
**Implementation:**
|
||||
|
||||
| Layer | Method | Details |
|
||||
|-------|--------|---------|
|
||||
| **User Auth** | Appwrite Auth | Email/password, OAuth providers |
|
||||
| **API Auth** | JWT tokens | Bearer token in Authorization header |
|
||||
| **Session Validation** | Every API call | Verify token, check expiry |
|
||||
| **Resource Access** | User ID check | Users can only access their own data |
|
||||
|
||||
### Input Validation
|
||||
|
||||
**Validation Strategy:**
|
||||
|
||||
| Input Type | Validation | Tools |
|
||||
|------------|------------|-------|
|
||||
| JSON payloads | Schema validation | Marshmallow or Pydantic |
|
||||
| Character names | Sanitize, length limits | Bleach library |
|
||||
| Chat messages | Sanitize, profanity filter | Custom validators |
|
||||
| AI prompts | Template-based only | Jinja2 (no direct user input) |
|
||||
|
||||
**Example Validation:**
|
||||
|
||||
| Field | Rules |
|
||||
|-------|-------|
|
||||
| Character name | 3-20 chars, alphanumeric + spaces only |
|
||||
| Gold amount | Positive integer, max 999,999,999 |
|
||||
| Action text | Max 500 chars, sanitized HTML |
|
||||
|
||||
### Rate Limiting
|
||||
|
||||
**Implementation:** Flask-Limiter with Redis backend
|
||||
|
||||
**Limits by Tier:**
|
||||
|
||||
| Tier | API Calls/Min | AI Calls/Day | Marketplace Actions/Day |
|
||||
|------|---------------|--------------|------------------------|
|
||||
| FREE | 30 | 50 | N/A |
|
||||
| BASIC | 60 | 200 | N/A |
|
||||
| PREMIUM | 120 | 1000 | 50 |
|
||||
| ELITE | 300 | Unlimited | 100 |
|
||||
|
||||
**Rate Limit Bypass:**
|
||||
- Admin accounts
|
||||
- Health check endpoints
|
||||
- Static assets
|
||||
|
||||
### API Security
|
||||
|
||||
**Configuration:**
|
||||
|
||||
| Setting | Value | Reason |
|
||||
|---------|-------|--------|
|
||||
| **CORS** | Production domain only | Prevent unauthorized access |
|
||||
| **HTTPS** | Required | Encrypt data in transit |
|
||||
| **API Keys** | Environment variables | Never in code |
|
||||
| **Appwrite Permissions** | Least privilege | Collection-level security |
|
||||
| **SQL Injection** | N/A | Using Appwrite (NoSQL) |
|
||||
| **XSS** | Sanitize all inputs | Prevent script injection |
|
||||
| **CSRF** | CSRF tokens | For form submissions |
|
||||
|
||||
### Data Protection
|
||||
|
||||
**Access Control Matrix:**
|
||||
|
||||
| Resource | Owner | Party Member | Public | System |
|
||||
|----------|-------|--------------|--------|--------|
|
||||
| Characters | RW | R | - | RW |
|
||||
| Sessions | R | RW (turn) | - | RW |
|
||||
| Marketplace Listings | RW (own) | - | R | RW |
|
||||
| Transactions | R (own) | - | - | RW |
|
||||
|
||||
**RW = Read/Write, R = Read only, - = No access**
|
||||
|
||||
### Secrets Management
|
||||
|
||||
**Never Commit:**
|
||||
- API keys
|
||||
- Database credentials
|
||||
- Secret keys
|
||||
- Tokens
|
||||
|
||||
**Best Practices:**
|
||||
- Use `.env` for local development
|
||||
- Use environment variables in production
|
||||
- Use secrets manager (AWS Secrets Manager, HashiCorp Vault) in production
|
||||
- Rotate keys regularly
|
||||
- Different keys for dev/staging/prod
|
||||
|
||||
---
|
||||
|
||||
## Backup & Recovery
|
||||
|
||||
### Appwrite Data Backup
|
||||
|
||||
**Strategy:**
|
||||
|
||||
| Data Type | Backup Frequency | Retention | Method |
|
||||
|-----------|------------------|-----------|--------|
|
||||
| Characters | Daily | 30 days | Appwrite export |
|
||||
| Sessions (active) | Hourly | 7 days | Appwrite export |
|
||||
| Marketplace | Daily | 30 days | Appwrite export |
|
||||
| Transactions | Daily | 90 days | Appwrite export |
|
||||
|
||||
**Backup Script:**
|
||||
- Export collections to JSON
|
||||
- Compress and encrypt
|
||||
- Upload to S3 or object storage
|
||||
- Verify backup integrity
|
||||
|
||||
### Disaster Recovery Plan
|
||||
|
||||
| Scenario | RTO | RPO | Steps |
|
||||
|----------|-----|-----|-------|
|
||||
| **Database corruption** | 4 hours | 24 hours | Restore from latest backup |
|
||||
| **API server down** | 15 minutes | 0 | Restart/failover to standby |
|
||||
| **Redis failure** | 5 minutes | Session data loss | Restart, users re-login |
|
||||
| **Complete infrastructure loss** | 24 hours | 24 hours | Restore from backups to new infrastructure |
|
||||
|
||||
**RTO = Recovery Time Objective, RPO = Recovery Point Objective**
|
||||
|
||||
---
|
||||
|
||||
## CI/CD Pipeline
|
||||
|
||||
### Recommended Workflow
|
||||
|
||||
| Stage | Actions | Tools |
|
||||
|-------|---------|-------|
|
||||
| **1. Commit** | Developer pushes to `dev` branch | Git |
|
||||
| **2. Build** | Run tests, lint code | GitHub Actions, pytest, flake8 |
|
||||
| **3. Test** | Unit tests, integration tests | pytest |
|
||||
| **4. Build Image** | Create Docker image | Docker |
|
||||
| **5. Deploy to Staging** | Deploy to staging environment | Docker, SSH |
|
||||
| **6. Manual Test** | QA testing on staging | Manual |
|
||||
| **7. Merge to Beta** | Promote to beta branch | Git |
|
||||
| **8. Deploy to Beta** | Deploy to beta environment | Docker, SSH |
|
||||
| **9. Merge to Master** | Production promotion | Git |
|
||||
| **10. Deploy to Prod** | Deploy to production | Docker, SSH |
|
||||
| **11. Tag Release** | Create version tag | Git |
|
||||
|
||||
### GitHub Actions Example
|
||||
|
||||
```yaml
|
||||
name: CI/CD
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [ dev, beta, master ]
|
||||
|
||||
jobs:
|
||||
test:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v2
|
||||
with:
|
||||
python-version: 3.11
|
||||
- name: Install dependencies
|
||||
run: pip install -r requirements.txt
|
||||
- name: Run tests
|
||||
run: pytest
|
||||
- name: Lint
|
||||
run: flake8 app/
|
||||
|
||||
build:
|
||||
needs: test
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
- name: Build Docker image
|
||||
run: docker build -t ai-dungeon-master:${{ github.ref_name }} .
|
||||
- name: Push to registry
|
||||
run: docker push ai-dungeon-master:${{ github.ref_name }}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Caching Strategy
|
||||
|
||||
| Cache Type | What to Cache | TTL |
|
||||
|------------|---------------|-----|
|
||||
| **Redis Cache** | Session data | 30 minutes |
|
||||
| | Character data (read-heavy) | 5 minutes |
|
||||
| | Marketplace listings | 1 minute |
|
||||
| | NPC shop items | 1 hour |
|
||||
| **Browser Cache** | Static assets | 1 year |
|
||||
| | API responses (GET) | 30 seconds |
|
||||
|
||||
### Database Optimization
|
||||
|
||||
**Appwrite Indexing:**
|
||||
- Index `userId` on characters collection
|
||||
- Index `status` on game_sessions collection
|
||||
- Index `listing_type` + `status` on marketplace_listings
|
||||
- Index `created_at` for time-based queries
|
||||
|
||||
### AI Call Optimization
|
||||
|
||||
**Strategies:**
|
||||
|
||||
| Strategy | Impact | Implementation |
|
||||
|----------|--------|----------------|
|
||||
| **Batch requests** | Reduce API calls | Combine multiple actions |
|
||||
| **Cache common responses** | Reduce cost | Cache item descriptions |
|
||||
| **Prompt optimization** | Reduce tokens | Shorter, more efficient prompts |
|
||||
| **Model selection** | Reduce cost | Use cheaper models when appropriate |
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
| Issue | Symptoms | Solution |
|
||||
|-------|----------|----------|
|
||||
| **RQ workers not processing** | Jobs stuck in queue | Check Redis connection, restart workers |
|
||||
| **AI calls failing** | 401/403 errors | Verify API keys, check rate limits |
|
||||
| **Appwrite connection errors** | Database errors | Check Appwrite status, verify credentials |
|
||||
| **Session not updating** | Stale data in UI | Check Appwrite Realtime connection |
|
||||
| **High latency** | Slow API responses | Check RQ queue length, scale workers |
|
||||
|
||||
### Debug Mode
|
||||
|
||||
**Enable Debug Logging:**
|
||||
|
||||
```bash
|
||||
export LOG_LEVEL=DEBUG
|
||||
flask run --debug
|
||||
```
|
||||
|
||||
**Debug Endpoints (development only):**
|
||||
- `GET /debug/health` - Health check
|
||||
- `GET /debug/redis` - Redis connection status
|
||||
- `GET /debug/queues` - RQ queue status
|
||||
|
||||
---
|
||||
|
||||
## Resources
|
||||
|
||||
| Resource | URL |
|
||||
|----------|-----|
|
||||
| **Appwrite Docs** | https://appwrite.io/docs |
|
||||
| **RQ Docs** | https://python-rq.org/ |
|
||||
| **Flask Docs** | https://flask.palletsprojects.com/ |
|
||||
| **Structlog Docs** | https://www.structlog.org/ |
|
||||
| **HTMX Docs** | https://htmx.org/docs/ |
|
||||
| **Anthropic API** | https://docs.anthropic.com/ |
|
||||
| **Replicate API** | https://replicate.com/docs |
|
||||
Reference in New Issue
Block a user