13 KiB
Nomad Deployment Guide for i80.dk Infrastructure
Last Updated: 2025-11-28
This guide covers deploying Python applications to your Nomad cluster with proper health checks, volumes, and Vault workarounds.
📋 Table of Contents
- Quick Start
- Health Checks - The #1 Pain Point
- Host Volumes - The #2 Pain Point
- Vault Workarounds
- Complete Nomad Job Example
- Dockerfile Best Practices
- Gitea CI/CD Workflow
- Troubleshooting
Quick Start
1. Add Health Endpoint to Your App
CRITICAL: Your app MUST respond to /health with HTTP 200 OK.
@app.route('/health')
def health():
return jsonify({'status': 'healthy'}), 200
2. Use Complete Nomad Job Template
Copy .gitea/workflows/nomad-job-complete.hcl.tmpl to your project and customize:
cp .gitea/workflows/nomad-job-complete.hcl.tmpl .gitea/workflows/nomad-job.hcl
Replace [[PROJECT_NAME]] and [[PORT]] with your values.
3. Build and Deploy
# Build Docker image
docker build -t registry.i80.dk/gitea/myapp:latest .
# Push to registry
docker push registry.i80.dk/gitea/myapp:latest
# Deploy to Nomad
nomad job run .gitea/workflows/nomad-job.hcl
Health Checks - The #1 Pain Point
Why Health Checks Fail
Common mistakes:
- ❌ No /health endpoint - App doesn't implement health endpoint
- ❌ Wrong port - Health check uses wrong port variable
- ❌ App not ready - Health check runs before app starts
- ❌ Blocking endpoint - /health takes too long to respond
- ❌ Wrong HTTP method - App expects POST, Consul sends GET
Proper Health Check Implementation
In your Flask app:
import time
app_start_time = time.time()
@app.route('/health')
def health():
"""
Health check endpoint for Consul/Nomad.
Returns:
200 OK: Service is healthy
503: Service is not ready or shutting down
"""
# Give app time to initialize (optional)
if time.time() - app_start_time < 5:
return jsonify({'status': 'starting'}), 503
# Add your health checks
try:
# Check database connection
# db.execute("SELECT 1")
# Check external dependencies
# api_client.ping()
return jsonify({
'status': 'healthy',
'uptime': time.time() - app_start_time
}), 200
except Exception as e:
return jsonify({
'status': 'unhealthy',
'error': str(e)
}), 503
In your Nomad job:
service {
name = "myapp"
port = "http"
check {
name = "http_health"
type = "http"
path = "/health"
interval = "10s"
timeout = "2s"
port = "http" # Use named port, NOT hardcoded!
# Give app time to start before first check
check_restart {
limit = 3
grace = "10s"
ignore_warnings = false
}
}
}
Testing Health Checks Locally
# Start your app
python app.py
# Test health endpoint
curl http://localhost:5000/health
# Should return:
# {"status": "healthy", "uptime": 123.45}
Common Health Check Issues
Issue: Service marked unhealthy immediately
Solution: Add check_restart grace period:
check_restart {
limit = 3
grace = "10s" # Wait 10s before first check
}
Issue: Health check timeout
Symptoms:
Health check timed out (timeout: 2s)
Solutions:
- Make /health endpoint faster
- Increase timeout:
timeout = "5s" - Remove slow operations from health check
Issue: Wrong port
Symptoms:
Connection refused on port 5000
Solution: Use dynamic port in Nomad job:
# ❌ WRONG - hardcoded port
check {
port = "5000"
}
# ✅ CORRECT - use named port
check {
port = "http"
}
# And in your app environment:
env {
PORT = "${NOMAD_PORT_http}"
}
Host Volumes - The #2 Pain Point
Why Host Volumes Fail
Common mistakes:
- ❌ Volume not declared on Nomad client - Must configure on Autobox first!
- ❌ Wrong source name - Source must match client config
- ❌ Permission issues - Volume owned by root, app runs as user
- ❌ Mount path conflicts - Path already exists in container
Setting Up Host Volumes
Step 1: Configure on Nomad Client (Autobox)
File: /etc/nomad.d/client.hcl on Autobox
client {
enabled = true
host_volume "myapp-data" {
path = "/opt/nomad-volumes/myapp-data"
read_only = false
}
}
Create directory:
# On Autobox
sudo mkdir -p /opt/nomad-volumes/myapp-data
sudo chown 1000:1000 /opt/nomad-volumes/myapp-data # Match container user
sudo chmod 755 /opt/nomad-volumes/myapp-data
Restart Nomad client:
sudo systemctl restart nomad
Step 2: Use Volume in Nomad Job
group "myapp-group" {
volume "data" {
type = "host"
source = "myapp-data" # Must match name in client.hcl
read_only = false
}
task "myapp-task" {
volume_mount {
volume = "data"
destination = "/app/data"
read_only = false
}
config {
image = "registry.i80.dk/gitea/myapp:latest"
}
}
}
Step 3: Use in Your App
import os
# Data directory from mounted volume
DATA_DIR = os.getenv('DATA_DIR', '/app/data')
# SQLite database in persistent volume
db_path = os.path.join(DATA_DIR, 'app.db')
Volume Permissions
Best Practice: Run container as non-root user
In Dockerfile:
# Create non-root user
RUN useradd -m -u 1000 appuser
# Switch to user
USER appuser
On Autobox:
# Set ownership to match container user (uid 1000)
sudo chown -R 1000:1000 /opt/nomad-volumes/myapp-data
Checking Volume Mounts
# On Nomad - check allocation
nomad alloc status <alloc-id>
# Look for volume mounts section:
# Mounted Volumes:
# data -> /opt/nomad-volumes/myapp-data
# SSH to Autobox and verify
ls -la /opt/nomad-volumes/myapp-data
Volume Backup
Simple backup script:
#!/bin/bash
# backup-volumes.sh
VOLUME_PATH="/opt/nomad-volumes/myapp-data"
BACKUP_PATH="/backup/$(date +%Y%m%d)"
mkdir -p "$BACKUP_PATH"
tar -czf "$BACKUP_PATH/myapp-data.tar.gz" "$VOLUME_PATH"
Vault Workarounds
Problem
Your Vault is currently not working. Can't use proper secret management.
Temporary Solutions
Option 1: Environment Variables in Nomad Job (NOT RECOMMENDED)
env {
APP_ENV = "production"
PORT = "${NOMAD_PORT_http}"
DATABASE_URL = "sqlite:///app/data/app.db"
API_KEY = "your-secret-key-here" # BAD: Secret in plain text!
}
Pros:
- Simple
- Works immediately
Cons:
- ❌ Secrets visible in Nomad UI
- ❌ Secrets in version control (if committed)
- ❌ Hard to rotate secrets
Option 2: File-Based Secrets (BETTER)
Store secrets in file on Autobox:
# On Autobox
sudo mkdir -p /opt/nomad-secrets/myapp
sudo vim /opt/nomad-secrets/myapp/secrets.env
# Content:
# API_KEY=your-secret-key
# DB_PASSWORD=your-db-password
sudo chown 1000:1000 /opt/nomad-secrets/myapp/secrets.env
sudo chmod 600 /opt/nomad-secrets/myapp/secrets.env
Mount as host volume:
group "myapp-group" {
volume "secrets" {
type = "host"
source = "myapp-secrets"
read_only = true # Read-only for security
}
task "myapp-task" {
volume_mount {
volume = "secrets"
destination = "/app/secrets"
read_only = true
}
# Read secrets file at startup
config {
command = "sh"
args = ["-c", "source /app/secrets/secrets.env && flask run --port $PORT"]
}
}
}
Pros:
- ✅ Secrets not in Nomad job file
- ✅ Can be backed up separately
- ✅ Easier to rotate
Cons:
- ⚠️ Still manual management
- ⚠️ Need to manage file permissions
Option 3: Consul KV Store (RECOMMENDED TEMPORARY)
# Store secret in Consul
consul kv put secret/myapp/api_key "your-secret-key"
In Nomad job template:
task "myapp-task" {
template {
data = <<EOH
{{ with key "secret/myapp/api_key" }}
API_KEY="{{ . }}"
{{ end }}
EOH
destination = "secrets/config.env"
env = true
}
}
Pros:
- ✅ Uses existing infrastructure (Consul)
- ✅ Can be managed via API
- ✅ Not visible in Nomad UI
Cons:
- ⚠️ Not as secure as Vault
- ⚠️ Manual secret rotation
When Vault is Fixed
Proper Vault integration:
task "myapp-task" {
vault {
policies = ["myapp-policy"]
}
template {
data = <<EOH
{{ with secret "secret/data/myapp" }}
API_KEY="{{ .Data.data.api_key }}"
DATABASE_URL="{{ .Data.data.database_url }}"
{{ end }}
EOH
destination = "secrets/config.env"
env = true
}
}
Complete Nomad Job Example
See .gitea/workflows/nomad-job-complete.hcl.tmpl for a fully documented example with:
- ✅ Proper health checks with grace period
- ✅ Host volume configuration
- ✅ Vault workarounds
- ✅ Auto-revert on failed deployments
- ✅ Graceful shutdown handling
- ✅ Resource limits
- ✅ Log rotation
Dockerfile Best Practices
Multi-Stage Build
# Builder stage
FROM python:3.11-slim as builder
WORKDIR /app
RUN pip install --user -r requirements.txt
# Runtime stage (smaller)
FROM python:3.11-slim
COPY --from=builder /root/.local /home/appuser/.local
USER appuser
CMD ["flask", "run"]
Benefits:
- Smaller final image
- Faster deployment
- Less attack surface
Non-Root User
# Create user
RUN useradd -m -u 1000 appuser
# Switch to user
USER appuser
Why:
- Security best practice
- Required for some volume mounts
- Prevents privilege escalation
Health Check
HEALTHCHECK --interval=30s --timeout=3s --start-period=10s \
CMD curl -f http://localhost:${PORT}/health || exit 1
Benefits:
- Docker can detect unhealthy containers
- Nomad respects Docker health checks
- Extra layer of monitoring
Gitea CI/CD Workflow
Complete Workflow Example
See .gitea/workflows/main.yml.tmpl for a complete Gitea Actions workflow that:
- ✅ Builds Docker image
- ✅ Tags with commit hash + latest
- ✅ Pushes to private registry
- ✅ Validates Nomad job
- ✅ Stops old deployment
- ✅ Deploys new version
- ✅ Updates nginx configuration
- ✅ Updates forwarder configuration
Secrets in Gitea
Configure in Gitea repository settings:
secrets.username- Registry usernamesecrets.password- Registry password
Self-Hosted Runner
Your runner must have:
- Docker installed
- Nomad CLI installed
- SSH access to Nomad server
- Access to private registry
Troubleshooting
Service Marked Unhealthy
Check Consul:
# On Nomad
consul catalog service myapp
# Look for:
# Checks:
# - http_health: critical
Check allocation logs:
nomad alloc logs -f <alloc-id> myapp-task
Common causes:
- /health endpoint not implemented
- App crashed
- Wrong port
- Slow startup
Container Keeps Restarting
Check allocation status:
nomad alloc status <alloc-id>
# Look at Recent Events:
# Started -> Restart Signaled -> Started ...
Common causes:
- Failed health checks
- App crash on startup
- Missing dependencies
- Port already in use
Volume Mount Issues
Check Nomad client config:
# On Autobox
sudo nomad agent-info | grep -A 10 "host_volumes"
Check permissions:
# On Autobox
ls -la /opt/nomad-volumes/myapp-data
# Should be owned by uid 1000 (or your container user)
Check allocation:
nomad alloc status <alloc-id>
# Look for Mounted Volumes section
Port Conflicts
Symptoms:
Failed to start task: bind: address already in use
Solution: Nomad assigns dynamic ports automatically:
network {
port "http" {
to = 5000 # Container internal port
# Nomad picks external port (30000-32000)
}
}
env {
PORT = "${NOMAD_PORT_http}" # Use Nomad's assigned port
}
Secrets Not Loading
Check Consul KV:
consul kv get secret/myapp/api_key
Check template rendering:
nomad alloc fs <alloc-id> secrets/
# Should see config.env or your secret files
View rendered template:
nomad alloc fs <alloc-id> secrets/config.env
Quick Reference
Essential Commands
# Check service health
consul catalog service myapp
# View allocation
nomad alloc status <alloc-id>
# View logs
nomad alloc logs -f <alloc-id> myapp-task
# Exec into container
nomad alloc exec -i -t <alloc-id> /bin/sh
# Restart job
nomad job restart myapp
# Stop job
nomad job stop myapp
# Force reschedule
nomad job dispatch -meta restart=true myapp
Health Check URL
# Find allocated port
nomad alloc status <alloc-id> | grep "Port.*http"
# Test health endpoint
curl http://192.168.15.124:30123/health
Volume Locations
- Client config:
/etc/nomad.d/client.hcl(on Autobox) - Volume data:
/opt/nomad-volumes/<volume-name>(on Autobox) - Secrets:
/opt/nomad-secrets/<app-name>(on Autobox)
For more information, see:
- Main infrastructure docs:
~/Projects/i80_network.md - Nomad docs: https://nomad.i80.dk:4646
- Consul UI: https://consul.i80.dk:8500