15 KiB
Deployment & Operations
Local Development Setup
Prerequisites
| Tool | Version | Purpose |
|---|---|---|
| Python | 3.11+ | Backend runtime |
| Docker | Latest | Local services |
| Redis | 7.0+ | Job queue & caching |
| Git | Latest | Version control |
Setup Steps
# 1. Clone repository
git clone <repo-url>
cd code_of_conquest
# 2. Create virtual environment
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Configure environment
cp .env.example .env
# Edit .env with your API keys and settings
# 5. Start local services
docker-compose up -d
# 6. Start RQ workers
rq worker ai_tasks combat_tasks marketplace_tasks &
# 7. Run Flask development server
flask run --debug
Environment Variables
| Variable | Description | Required |
|---|---|---|
FLASK_ENV |
development/production | Yes |
SECRET_KEY |
Flask secret key | Yes |
REPLICATE_API_KEY |
Replicate API key | Yes |
ANTHROPIC_API_KEY |
Anthropic API key | Yes |
APPWRITE_ENDPOINT |
Appwrite server URL | Yes |
APPWRITE_PROJECT_ID |
Appwrite project ID | Yes |
APPWRITE_API_KEY |
Appwrite API key | Yes |
REDIS_URL |
Redis connection URL | Yes |
LOG_LEVEL |
Logging level (DEBUG/INFO/WARNING/ERROR) | No |
Docker Compose (Local Development)
docker-compose.yml:
version: '3.8'
services:
redis:
image: redis:alpine
ports:
- "6379:6379"
volumes:
- redis_data:/data
rq-worker:
build: .
command: rq worker ai_tasks combat_tasks marketplace_tasks --url redis://redis:6379
depends_on:
- redis
env_file:
- .env
environment:
- REDIS_URL=redis://redis:6379
volumes:
redis_data:
Testing Strategy
Manual Testing (Preferred)
API Testing Document: docs/API_TESTING.md
Contains:
- Endpoint examples
- Sample curl/httpie commands
- Expected responses
- Authentication setup
Example API Test:
# Login
curl -X POST http://localhost:5000/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"email": "test@example.com", "password": "password123"}'
# Create character (with auth token)
curl -X POST http://localhost:5000/api/v1/characters \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{"name": "Aragorn", "class_id": "vanguard"}'
Unit Tests (Optional)
Framework: pytest
Test Categories:
| Category | Location | Focus |
|---|---|---|
| Combat | tests/test_combat.py |
Damage calculations, effect processing |
| Skills | tests/test_skills.py |
Skill unlock logic, prerequisites |
| Marketplace | tests/test_marketplace.py |
Bidding logic, auction processing |
| Character | tests/test_character.py |
Character creation, stats |
Run Tests:
# All tests
pytest
# Specific test file
pytest tests/test_combat.py
# With coverage
pytest --cov=app tests/
Load Testing
Tool: Locust or Apache Bench
Test Scenarios:
| Scenario | Target | Success Criteria |
|---|---|---|
| Concurrent AI requests | 50 concurrent users | < 5s response time |
| Marketplace browsing | 100 concurrent users | < 1s response time |
| Session realtime updates | 10 players per session | < 100ms update latency |
Production Deployment
Deployment Checklist
Pre-Deployment:
- All environment variables configured
- Appwrite collections created with proper permissions
- Redis configured and accessible
- RQ workers running
- SSL certificates installed
- Rate limiting configured
- Error logging/monitoring set up (Sentry recommended)
- Backup strategy for Appwrite data
Production Configuration:
DEBUG = Falsein Flask- Secure session keys (random, long)
- CORS restricted to production domain
- Rate limits appropriate for production
- AI cost alerts configured
- CDN for static assets (optional)
Dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY . .
# Create non-root user
RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser
# Expose port
EXPOSE 5000
# Run application
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--workers", "4", "wsgi:app"]
Build & Push Script
scripts/build_and_push.sh:
#!/bin/bash
# Get current git branch
BRANCH=$(git rev-parse --abbrev-ref HEAD)
# Ask for tag options
read -p "Tag as :latest? (y/n) " TAG_LATEST
read -p "Push to registry? (y/n) " PUSH_IMAGE
# Build image
docker build -t ai-dungeon-master:$BRANCH .
if [ "$TAG_LATEST" = "y" ]; then
docker tag ai-dungeon-master:$BRANCH ai-dungeon-master:latest
fi
if [ "$PUSH_IMAGE" = "y" ]; then
docker push ai-dungeon-master:$BRANCH
if [ "$TAG_LATEST" = "y" ]; then
docker push ai-dungeon-master:latest
fi
fi
Production Environment
Recommended Stack:
- Web Server: Nginx (reverse proxy)
- WSGI Server: Gunicorn (4+ workers)
- Process Manager: Supervisor or systemd
- Redis: Standalone or Redis Cluster
- RQ Workers: Separate instances for each queue
Scaling Strategy:
| Component | Scaling Method | Trigger |
|---|---|---|
| Flask API | Horizontal (add workers) | CPU > 70% |
| RQ Workers | Horizontal (add workers) | Queue length > 100 |
| Redis | Vertical (upgrade instance) | Memory > 80% |
| Appwrite | Managed by Appwrite | N/A |
Monitoring & Logging
Application Logging
Logging Configuration:
| Level | Use Case | Examples |
|---|---|---|
| DEBUG | Development only | Variable values, function calls |
| INFO | Normal operations | User actions, API calls |
| WARNING | Potential issues | Rate limit approaching, slow queries |
| ERROR | Errors (recoverable) | Failed AI calls, validation errors |
| CRITICAL | Critical failures | Database connection lost, service down |
Structured Logging with Structlog:
import structlog
logger = structlog.get_logger(__name__)
logger.info("Combat action executed",
session_id=session_id,
character_id=character_id,
action_type="attack",
damage=15
)
Monitoring Tools
Recommended Tools:
| Tool | Purpose | Priority |
|---|---|---|
| Sentry | Error tracking and alerting | High |
| Prometheus | Metrics collection | Medium |
| Grafana | Metrics visualization | Medium |
| Uptime Robot | Uptime monitoring | High |
| CloudWatch | AWS logs/metrics (if using AWS) | Medium |
Key Metrics to Monitor
| Metric | Alert Threshold | Action |
|---|---|---|
| API response time | > 3s average | Scale workers |
| Error rate | > 5% | Investigate logs |
| AI API errors | > 10% | Check API status |
| Queue length | > 500 | Add workers |
| Redis memory | > 80% | Upgrade instance |
| CPU usage | > 80% | Scale horizontally |
| AI cost per day | > budget × 1.2 | Investigate usage |
AI Cost Tracking
Log Structure:
| Field | Type | Purpose |
|---|---|---|
user_id |
str | Track per-user usage |
model |
str | Which model used |
tier |
str | FREE/STANDARD/PREMIUM |
tokens_used |
int | Token count |
cost_estimate |
float | Estimated cost |
timestamp |
datetime | When called |
context_type |
str | What prompted the call |
Daily Report:
- Total AI calls per tier
- Total tokens used
- Estimated cost
- Top users by usage
- Anomaly detection (unusual spikes)
Security
Authentication & Authorization
Implementation:
| Layer | Method | Details |
|---|---|---|
| User Auth | Appwrite Auth | Email/password, OAuth providers |
| API Auth | JWT tokens | Bearer token in Authorization header |
| Session Validation | Every API call | Verify token, check expiry |
| Resource Access | User ID check | Users can only access their own data |
Input Validation
Validation Strategy:
| Input Type | Validation | Tools |
|---|---|---|
| JSON payloads | Schema validation | Marshmallow or Pydantic |
| Character names | Sanitize, length limits | Bleach library |
| Chat messages | Sanitize, profanity filter | Custom validators |
| AI prompts | Template-based only | Jinja2 (no direct user input) |
Example Validation:
| Field | Rules |
|---|---|
| Character name | 3-20 chars, alphanumeric + spaces only |
| Gold amount | Positive integer, max 999,999,999 |
| Action text | Max 500 chars, sanitized HTML |
Rate Limiting
Implementation: Flask-Limiter with Redis backend
Limits by Tier:
| Tier | API Calls/Min | AI Calls/Day | Marketplace Actions/Day |
|---|---|---|---|
| FREE | 30 | 50 | N/A |
| BASIC | 60 | 200 | N/A |
| PREMIUM | 120 | 1000 | 50 |
| ELITE | 300 | Unlimited | 100 |
Rate Limit Bypass:
- Admin accounts
- Health check endpoints
- Static assets
API Security
Configuration:
| Setting | Value | Reason |
|---|---|---|
| CORS | Production domain only | Prevent unauthorized access |
| HTTPS | Required | Encrypt data in transit |
| API Keys | Environment variables | Never in code |
| Appwrite Permissions | Least privilege | Collection-level security |
| SQL Injection | N/A | Using Appwrite (NoSQL) |
| XSS | Sanitize all inputs | Prevent script injection |
| CSRF | CSRF tokens | For form submissions |
Data Protection
Access Control Matrix:
| Resource | Owner | Party Member | Public | System |
|---|---|---|---|---|
| Characters | RW | R | - | RW |
| Sessions | R | RW (turn) | - | RW |
| Marketplace Listings | RW (own) | - | R | RW |
| Transactions | R (own) | - | - | RW |
RW = Read/Write, R = Read only, - = No access
Secrets Management
Never Commit:
- API keys
- Database credentials
- Secret keys
- Tokens
Best Practices:
- Use
.envfor local development - Use environment variables in production
- Use secrets manager (AWS Secrets Manager, HashiCorp Vault) in production
- Rotate keys regularly
- Different keys for dev/staging/prod
Backup & Recovery
Appwrite Data Backup
Strategy:
| Data Type | Backup Frequency | Retention | Method |
|---|---|---|---|
| Characters | Daily | 30 days | Appwrite export |
| Sessions (active) | Hourly | 7 days | Appwrite export |
| Marketplace | Daily | 30 days | Appwrite export |
| Transactions | Daily | 90 days | Appwrite export |
Backup Script:
- Export collections to JSON
- Compress and encrypt
- Upload to S3 or object storage
- Verify backup integrity
Disaster Recovery Plan
| Scenario | RTO | RPO | Steps |
|---|---|---|---|
| Database corruption | 4 hours | 24 hours | Restore from latest backup |
| API server down | 15 minutes | 0 | Restart/failover to standby |
| Redis failure | 5 minutes | Session data loss | Restart, users re-login |
| Complete infrastructure loss | 24 hours | 24 hours | Restore from backups to new infrastructure |
RTO = Recovery Time Objective, RPO = Recovery Point Objective
CI/CD Pipeline
Recommended Workflow
| Stage | Actions | Tools |
|---|---|---|
| 1. Commit | Developer pushes to dev branch |
Git |
| 2. Build | Run tests, lint code | GitHub Actions, pytest, flake8 |
| 3. Test | Unit tests, integration tests | pytest |
| 4. Build Image | Create Docker image | Docker |
| 5. Deploy to Staging | Deploy to staging environment | Docker, SSH |
| 6. Manual Test | QA testing on staging | Manual |
| 7. Merge to Beta | Promote to beta branch | Git |
| 8. Deploy to Beta | Deploy to beta environment | Docker, SSH |
| 9. Merge to Master | Production promotion | Git |
| 10. Deploy to Prod | Deploy to production | Docker, SSH |
| 11. Tag Release | Create version tag | Git |
GitHub Actions Example
name: CI/CD
on:
push:
branches: [ dev, beta, master ]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.11
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run tests
run: pytest
- name: Lint
run: flake8 app/
build:
needs: test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Build Docker image
run: docker build -t ai-dungeon-master:${{ github.ref_name }} .
- name: Push to registry
run: docker push ai-dungeon-master:${{ github.ref_name }}
Performance Optimization
Caching Strategy
| Cache Type | What to Cache | TTL |
|---|---|---|
| Redis Cache | Session data | 30 minutes |
| Character data (read-heavy) | 5 minutes | |
| Marketplace listings | 1 minute | |
| NPC shop items | 1 hour | |
| Browser Cache | Static assets | 1 year |
| API responses (GET) | 30 seconds |
Database Optimization
Appwrite Indexing:
- Index
userIdon characters collection - Index
statuson game_sessions collection - Index
listing_type+statuson marketplace_listings - Index
created_atfor time-based queries
AI Call Optimization
Strategies:
| Strategy | Impact | Implementation |
|---|---|---|
| Batch requests | Reduce API calls | Combine multiple actions |
| Cache common responses | Reduce cost | Cache item descriptions |
| Prompt optimization | Reduce tokens | Shorter, more efficient prompts |
| Model selection | Reduce cost | Use cheaper models when appropriate |
Troubleshooting
Common Issues
| Issue | Symptoms | Solution |
|---|---|---|
| RQ workers not processing | Jobs stuck in queue | Check Redis connection, restart workers |
| AI calls failing | 401/403 errors | Verify API keys, check rate limits |
| Appwrite connection errors | Database errors | Check Appwrite status, verify credentials |
| Session not updating | Stale data in UI | Check Appwrite Realtime connection |
| High latency | Slow API responses | Check RQ queue length, scale workers |
Debug Mode
Enable Debug Logging:
export LOG_LEVEL=DEBUG
flask run --debug
Debug Endpoints (development only):
GET /debug/health- Health checkGET /debug/redis- Redis connection statusGET /debug/queues- RQ queue status
Resources
| Resource | URL |
|---|---|
| Appwrite Docs | https://appwrite.io/docs |
| RQ Docs | https://python-rq.org/ |
| Flask Docs | https://flask.palletsprojects.com/ |
| Structlog Docs | https://www.structlog.org/ |
| HTMX Docs | https://htmx.org/docs/ |
| Anthropic API | https://docs.anthropic.com/ |
| Replicate API | https://replicate.com/docs |