Updated network
This commit is contained in:
312
Docs/DEPLOYMENT_CHECKLIST.md
Normal file
312
Docs/DEPLOYMENT_CHECKLIST.md
Normal file
@@ -0,0 +1,312 @@
|
||||
# Deployment Checklist
|
||||
|
||||
Use this checklist when deploying a new service to ensure you don't miss critical steps.
|
||||
|
||||
## Pre-Deployment
|
||||
|
||||
### Application Requirements
|
||||
|
||||
- [ ] **Health endpoint implemented** - `/health` returns 200 OK
|
||||
- Returns JSON with status
|
||||
- Responds quickly (<500ms)
|
||||
- Doesn't block on external services
|
||||
|
||||
- [ ] **Port configuration** - App reads `PORT` from environment
|
||||
```python
|
||||
PORT = int(os.getenv('PORT', 5000))
|
||||
app.run(host='0.0.0.0', port=PORT)
|
||||
```
|
||||
|
||||
- [ ] **Graceful shutdown** - App handles SIGTERM signal
|
||||
- Closes connections cleanly
|
||||
- Finishes current requests
|
||||
- Exits within 30 seconds
|
||||
|
||||
- [ ] **Logging configured** - Uses stdout/stderr
|
||||
- Structured logging (JSON preferred)
|
||||
- Includes timestamps
|
||||
- No log files (Nomad captures stdout)
|
||||
|
||||
### Docker Image
|
||||
|
||||
- [ ] **Dockerfile complete** - Based on `Dockerfile.complete`
|
||||
- Multi-stage build (smaller image)
|
||||
- Non-root user (uid 1000)
|
||||
- Health check defined
|
||||
- Minimal base image
|
||||
|
||||
- [ ] **Image tested locally**
|
||||
```bash
|
||||
docker build -t myapp:test .
|
||||
docker run -p 5000:5000 -e PORT=5000 myapp:test
|
||||
curl http://localhost:5000/health
|
||||
```
|
||||
|
||||
- [ ] **Image pushed to registry**
|
||||
```bash
|
||||
docker tag myapp:test registry.i80.dk/gitea/myapp:latest
|
||||
docker push registry.i80.dk/gitea/myapp:latest
|
||||
```
|
||||
|
||||
### Nomad Job Configuration
|
||||
|
||||
- [ ] **Job file created** - Copy from `nomad-job-complete.hcl.tmpl`
|
||||
- Replace `[[PROJECT_NAME]]` with actual name
|
||||
- Replace `[[PORT]]` with app port (usually 5000)
|
||||
- Update resource limits (CPU/memory)
|
||||
|
||||
- [ ] **Health check configured** - Uses named port, not hardcoded
|
||||
```hcl
|
||||
check {
|
||||
port = "http" # NOT "5000"!
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Traefik tags correct** - Domain matches expected URL
|
||||
```hcl
|
||||
"traefik.http.routers.myapp.rule=Host(`myapp.i80.dk`)"
|
||||
```
|
||||
|
||||
- [ ] **Volumes declared** (if needed)
|
||||
- Volume source matches Autobox config
|
||||
- Mount path correct
|
||||
- Permissions considered
|
||||
|
||||
- [ ] **Secrets configured** - Using chosen workaround method
|
||||
- Environment variables OR
|
||||
- File-based secrets OR
|
||||
- Consul KV
|
||||
|
||||
- [ ] **Job validates** - No syntax errors
|
||||
```bash
|
||||
nomad job validate nomad-job.hcl
|
||||
```
|
||||
|
||||
### Autobox Configuration
|
||||
|
||||
- [ ] **Volumes created** (if needed)
|
||||
```bash
|
||||
# Run on Autobox
|
||||
sudo ./setup-nomad-volumes.sh myapp
|
||||
```
|
||||
|
||||
- [ ] **Volumes show in agent-info**
|
||||
```bash
|
||||
nomad agent-info | grep myapp-data
|
||||
```
|
||||
|
||||
- [ ] **Secrets file created** (if using file-based secrets)
|
||||
```bash
|
||||
sudo vim /opt/nomad-secrets/myapp/secrets.env
|
||||
```
|
||||
|
||||
- [ ] **Permissions correct**
|
||||
```bash
|
||||
ls -la /opt/nomad-volumes/myapp-data # Should be 1000:1000
|
||||
```
|
||||
|
||||
### Gitea CI/CD (if using)
|
||||
|
||||
- [ ] **Workflow file created** - Copy from `main.yml.tmpl`
|
||||
- Replace `[[PROJECT_NAME]]` everywhere
|
||||
- Registry credentials configured
|
||||
|
||||
- [ ] **Secrets configured** - In Gitea repository settings
|
||||
- `secrets.username` - Registry username
|
||||
- `secrets.password` - Registry password
|
||||
|
||||
- [ ] **Self-hosted runner** - Has necessary access
|
||||
- Docker installed
|
||||
- Nomad CLI installed
|
||||
- SSH access to Nomad server
|
||||
|
||||
## Deployment
|
||||
|
||||
### Initial Deployment
|
||||
|
||||
- [ ] **Job submitted**
|
||||
```bash
|
||||
nomad job run nomad-job.hcl
|
||||
```
|
||||
|
||||
- [ ] **Allocation running**
|
||||
```bash
|
||||
nomad job status myapp
|
||||
# Should show: Running = 1
|
||||
```
|
||||
|
||||
- [ ] **No errors in logs**
|
||||
```bash
|
||||
nomad alloc logs -f <alloc-id> myapp-task
|
||||
```
|
||||
|
||||
### Consul Registration
|
||||
|
||||
- [ ] **Service registered**
|
||||
```bash
|
||||
consul catalog service myapp
|
||||
```
|
||||
|
||||
- [ ] **Service healthy**
|
||||
```bash
|
||||
consul catalog service myapp
|
||||
# Look for: Checks: http_health: passing
|
||||
```
|
||||
|
||||
- [ ] **Tags correct**
|
||||
```bash
|
||||
consul catalog service myapp
|
||||
# Verify traefik tags present
|
||||
```
|
||||
|
||||
### DNS & Access
|
||||
|
||||
- [ ] **DNS record created** - Check consul-template output
|
||||
```bash
|
||||
cat /certs/consul/trinity_powerdns_records.txt | grep myapp
|
||||
```
|
||||
|
||||
- [ ] **Nginx config generated**
|
||||
```bash
|
||||
grep myapp /certs/consul-nginx/conf.d/services.conf
|
||||
```
|
||||
|
||||
- [ ] **Nginx reloaded** - Check watcher logs
|
||||
```bash
|
||||
tail -f /var/log/nginx_restater.log
|
||||
```
|
||||
|
||||
- [ ] **Service accessible** - Test public URL
|
||||
```bash
|
||||
curl https://myapp.i80.dk
|
||||
curl https://myapp.i80.dk/health
|
||||
```
|
||||
|
||||
## Post-Deployment
|
||||
|
||||
### Verification
|
||||
|
||||
- [ ] **Health check passing** - For at least 5 minutes
|
||||
```bash
|
||||
watch -n 5 'consul catalog service myapp'
|
||||
```
|
||||
|
||||
- [ ] **No restarts** - Allocation stable
|
||||
```bash
|
||||
nomad alloc status <alloc-id>
|
||||
# Check "Recent Events" - no restarts
|
||||
```
|
||||
|
||||
- [ ] **Logs clean** - No errors or warnings
|
||||
```bash
|
||||
nomad alloc logs -f <alloc-id> myapp-task
|
||||
```
|
||||
|
||||
- [ ] **Performance acceptable**
|
||||
- Response time < 1s
|
||||
- Memory usage stable
|
||||
- CPU usage reasonable
|
||||
|
||||
### Monitoring
|
||||
|
||||
- [ ] **Metrics accessible** - If implemented
|
||||
```bash
|
||||
curl https://myapp.i80.dk/metrics
|
||||
```
|
||||
|
||||
- [ ] **Logs searchable** - Can find application logs
|
||||
```bash
|
||||
nomad alloc logs -f <alloc-id> myapp-task | grep ERROR
|
||||
```
|
||||
|
||||
- [ ] **Alerts configured** - If using monitoring system
|
||||
- Health check failures
|
||||
- High error rate
|
||||
- High memory usage
|
||||
|
||||
### Documentation
|
||||
|
||||
- [ ] **Service documented** - In team wiki/docs
|
||||
- What it does
|
||||
- Where it's deployed
|
||||
- How to access it
|
||||
- Known issues
|
||||
|
||||
- [ ] **Runbook created** - For operational issues
|
||||
- How to restart
|
||||
- How to check logs
|
||||
- Common troubleshooting steps
|
||||
- Escalation path
|
||||
|
||||
- [ ] **Secrets documented** - Where they're stored
|
||||
- Which Consul KV keys
|
||||
- Which files on Autobox
|
||||
- Who has access
|
||||
|
||||
## Rollback Plan
|
||||
|
||||
- [ ] **Previous version tagged** - In case of issues
|
||||
```bash
|
||||
docker tag myapp:latest myapp:stable
|
||||
```
|
||||
|
||||
- [ ] **Rollback tested** - Know how to revert
|
||||
```bash
|
||||
# Update job file to use :stable tag
|
||||
# nomad job run nomad-job.hcl
|
||||
```
|
||||
|
||||
- [ ] **Data backup** - Before first deployment
|
||||
```bash
|
||||
# If using volumes
|
||||
sudo tar -czf /backup/myapp-data.tar.gz /opt/nomad-volumes/myapp-data
|
||||
```
|
||||
|
||||
## Common Issues Checklist
|
||||
|
||||
If deployment fails, check:
|
||||
|
||||
- [ ] Is `/health` endpoint implemented and returning 200?
|
||||
- [ ] Is app binding to `0.0.0.0` (not `127.0.0.1`)?
|
||||
- [ ] Is app reading `PORT` from environment variable?
|
||||
- [ ] Are health check port references correct (no hardcoded ports)?
|
||||
- [ ] Do volume paths match between Autobox and Nomad job?
|
||||
- [ ] Are volume permissions correct (uid 1000)?
|
||||
- [ ] Are secrets accessible (environment or files)?
|
||||
- [ ] Is Docker image pulling successfully?
|
||||
- [ ] Is allocation getting scheduled (not pending)?
|
||||
- [ ] Are there port conflicts?
|
||||
|
||||
## Quick Debugging Commands
|
||||
|
||||
```bash
|
||||
# Service status
|
||||
consul catalog service myapp
|
||||
nomad job status myapp
|
||||
|
||||
# Allocation details
|
||||
ALLOC_ID=$(nomad job status myapp | grep running | head -1 | awk '{print $1}')
|
||||
nomad alloc status $ALLOC_ID
|
||||
|
||||
# Logs
|
||||
nomad alloc logs -f $ALLOC_ID myapp-task
|
||||
nomad alloc logs -stderr -f $ALLOC_ID myapp-task
|
||||
|
||||
# Exec into container
|
||||
nomad alloc exec -i -t $ALLOC_ID /bin/sh
|
||||
|
||||
# Health check test
|
||||
PORT=$(nomad alloc status $ALLOC_ID | grep "Port.*http" | awk '{print $3}' | cut -d':' -f2)
|
||||
curl http://192.168.15.124:$PORT/health
|
||||
|
||||
# Restart
|
||||
nomad job restart myapp
|
||||
|
||||
# Force reschedule
|
||||
nomad job stop -purge myapp
|
||||
nomad job run nomad-job.hcl
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Print this checklist and use it for every deployment until the process becomes second nature!**
|
||||
Reference in New Issue
Block a user