Production runtime: WSGI server choice and configuration
In production, you typically run Flask behind a dedicated WSGI server. The WSGI server is responsible for managing worker processes/threads, timeouts, and graceful restarts. Your Flask app should be exposed as a WSGI callable (for example, app), and the server imports it.
Gunicorn (common default)
Gunicorn is a widely used WSGI server for Linux. A typical pattern is to run multiple workers and set timeouts appropriate to your service.
# Install (pin in requirements/lockfile, see checklist below)
pip install gunicorn
# Example command
# -w: workers (start with 2-4 per CPU core for sync workers, then measure)
# -k: worker class (sync by default; use gthread for some concurrency)
# --timeout: hard timeout for requests
# --graceful-timeout: time to finish in-flight requests on restart
# --access-logfile/- --error-logfile: log to stdout/stderr in containers
gunicorn "myservice.wsgi:app" \
-w 4 \
-k gthread --threads 8 \
--timeout 30 \
--graceful-timeout 20 \
--keep-alive 5 \
--access-logfile - \
--error-logfile -Notes:
- Worker model:
syncis simplest;gthreadcan help for I/O-bound endpoints; for async stacks you’d typically use ASGI instead of WSGI. - Timeouts: keep them aligned with your load balancer/proxy timeouts to avoid half-open requests.
- Logging: in containerized environments, log to stdout/stderr and let the platform collect logs.
uWSGI (alternative)
uWSGI is powerful but has more configuration surface area. If you use it, keep configuration explicit and versioned, and ensure you understand worker lifecycle and signal handling.
Expose a WSGI entrypoint
Keep a small module that creates the app and exposes it for the server to import.
Continue in our app.
You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.
Or continue reading below...Download the app
# myservice/wsgi.py
from myservice import create_app
app = create_app()Environment variables and secret management
Production configuration should be injected at runtime, not committed to source control. Environment variables are the most portable mechanism across PaaS, containers, and VM deployments.
What belongs in env vars
- Connection strings (DB URL), cache endpoints, external API base URLs.
- Feature flags and operational toggles.
- Secrets: signing keys, API tokens, OAuth client secrets (preferably via a secret manager).
Secret management patterns
Environment variables are often the delivery mechanism, but the source of truth should be a secret manager when possible.
- Managed secret stores: AWS Secrets Manager/SSM, GCP Secret Manager, Azure Key Vault, Vault. Your runtime injects secrets as env vars or mounted files.
- Mounted secret files: in Kubernetes, secrets can be mounted as files; your app reads from a path like
/run/secrets/.... - Rotation readiness: design for secret rotation by reloading on restart (most common) and keeping TTLs short where feasible.
Practical step-by-step: minimal secret loading
This pattern reads a secret from an env var first, then falls back to a file path if provided.
# myservice/secrets.py
import os
from pathlib import Path
class SecretError(RuntimeError):
pass
def read_secret(name: str, *, file_var: str | None = None) -> str:
val = os.getenv(name)
if val:
return val
if file_var:
p = os.getenv(file_var)
if p:
path = Path(p)
if path.exists():
return path.read_text(encoding="utf-8").strip()
raise SecretError(f"Missing required secret: {name}")Basic containerization patterns
Containerization standardizes runtime dependencies and makes deployments repeatable. The goal is a small image, predictable startup, and clear separation of build-time vs runtime configuration.
Dockerfile pattern (multi-stage, non-root, pinned deps)
# syntax=docker/dockerfile:1
FROM python:3.12-slim AS base
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1
# System deps (keep minimal)
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
# Install dependencies first for better caching
COPY requirements.txt /app/requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . /app
# Create a non-root user
RUN useradd -r -u 10001 appuser && chown -R appuser:appuser /app
USER appuser
EXPOSE 8000
# Gunicorn as PID 1 (or use tini; see graceful shutdown section)
CMD ["gunicorn", "myservice.wsgi:app", "-w", "4", "-k", "gthread", "--threads", "8", "--bind", "0.0.0.0:8000", "--access-logfile", "-", "--error-logfile", "-", "--timeout", "30", "--graceful-timeout", "20"]Keep environment-specific values out of the image. Inject them at runtime using your orchestrator or deployment system.
Runtime configuration via env vars
# Example runtime env vars
FLASK_ENV=production
APP_ENV=production
DATABASE_URL=postgresql+psycopg://...
SECRET_KEY=...
SENTRY_DSN=...
LOG_LEVEL=INFOHealth checks: liveness and readiness
Health checks allow your platform to route traffic only to healthy instances and to restart instances that are stuck. Use two endpoints with different semantics:
- Liveness: “process is alive” (fast, no dependencies). If this fails, the instance should be restarted.
- Readiness: “ready to serve traffic” (may include dependency checks like DB connectivity). If this fails, the instance should be removed from rotation but not necessarily restarted.
Practical endpoints
# myservice/health.py
from flask import Blueprint, jsonify
bp = Blueprint("health", __name__)
@bp.get("/healthz")
def healthz():
return jsonify(status="ok"), 200
@bp.get("/readyz")
def readyz():
# Keep this lightweight; avoid expensive queries.
# Optionally: check DB connectivity with a simple SELECT 1.
return jsonify(status="ready"), 200Wire these endpoints into your routing and configure your load balancer/orchestrator to call them on a schedule.
Graceful shutdown and worker lifecycle
During deploys and autoscaling, instances receive termination signals. A graceful shutdown lets in-flight requests finish (within a deadline) and stops accepting new requests.
Key points
- WSGI server responsibility: Gunicorn handles signals and worker draining; configure
--graceful-timeout. - App responsibility: close resources cleanly (DB sessions, background threads) when the process exits.
- PID 1 signal handling: in containers, ensure signals reach Gunicorn. If you wrap commands in shell scripts, use
execso the server becomes PID 1.
Container entrypoint tip
# If using a shell entrypoint, end with exec so signals propagate
exec gunicorn "myservice.wsgi:app" --bind 0.0.0.0:8000 ...Config validation at startup (fail fast)
Production failures are easier to diagnose when the service refuses to start with invalid configuration. Validate required env vars, URL formats, and critical settings before accepting traffic.
Practical step-by-step: validate on app creation
# myservice/startup_checks.py
import os
from urllib.parse import urlparse
class ConfigError(RuntimeError):
pass
def require_env(name: str) -> str:
val = os.getenv(name)
if not val:
raise ConfigError(f"Missing env var: {name}")
return val
def validate_database_url():
db = require_env("DATABASE_URL")
parsed = urlparse(db)
if not parsed.scheme or not parsed.netloc:
raise ConfigError("DATABASE_URL is not a valid URL")
def validate_production_flags():
debug = os.getenv("FLASK_DEBUG", "0")
if debug not in ("0", "false", "False", ""):
raise ConfigError("FLASK_DEBUG must be disabled in production")
def run_startup_checks():
validate_database_url()
validate_production_flags()# myservice/__init__.py (inside create_app)
from .startup_checks import run_startup_checks
def create_app():
run_startup_checks()
...
return appKeep startup checks deterministic and fast. If you need dependency checks (DB reachable), prefer readiness checks so deploys don’t fail due to transient dependency outages—unless your policy is to fail fast on missing dependencies.
Production logging and error reporting integration points
In production, logs should be structured, consistent, and correlated with requests. Error reporting should capture stack traces and context without leaking secrets.
Logging in production: practical considerations
- Write to stdout/stderr: let the platform ship logs.
- Include request correlation: propagate a request ID from the edge (or generate one) and include it in logs.
- Separate access logs: Gunicorn access logs can be sufficient; ensure they include latency and status codes.
- PII/secret hygiene: never log raw tokens, passwords, or full authorization headers.
Error reporting integration points
Error reporting tools (e.g., Sentry, Rollbar, Bugsnag) typically integrate at two levels:
- WSGI middleware: captures unhandled exceptions and request context.
- Flask integration: hooks into Flask’s exception handling and can attach user/request metadata.
Operational pattern:
- Enable error reporting only when a DSN/key is present.
- Set environment/release version tags to correlate errors with deployments.
- Scrub sensitive fields (headers, payload keys like
password,token).
# Pseudocode wiring (library-specific)
DSN = os.getenv("SENTRY_DSN")
if DSN:
init_error_reporting(dsn=DSN, environment=os.getenv("APP_ENV"), release=os.getenv("GIT_SHA"))Running migrations safely during deployment
Schema migrations are part of deployment, but they can cause downtime if applied unsafely. The safest approach depends on your release strategy (rolling deploys, blue/green) and database constraints.
Principles for safe migrations
- One-way compatibility: deploy code that can run against both the old and new schema during a rolling update (expand/contract pattern).
- Separate migration step: run migrations as a distinct job before (or during) rollout, not inside every app instance startup.
- Idempotency: migration command should be safe to run once; avoid concurrent runners.
- Locking awareness: large table changes can lock; schedule or use online migration techniques where supported.
Practical step-by-step: deployment pipeline migration job
| Step | Action | Notes |
|---|---|---|
| 1 | Build image | Same artifact used for migration job and app rollout. |
| 2 | Run migrations as a one-off job | Ensure only one runner (CI job, Kubernetes Job with concurrency policy, etc.). |
| 3 | Deploy application | Rolling update/blue-green with readiness checks. |
| 4 | Post-deploy smoke tests | Hit key endpoints and verify critical flows. |
# Example command in CI/CD (tooling-specific)
flask db upgradeIf you must run migrations at startup (not recommended for multi-replica services), implement a distributed lock (DB advisory lock) and keep migrations fast; otherwise you risk multiple instances racing or blocking startup.
Production readiness checklist
Security headers basics
Set baseline security headers at the edge (reverse proxy) or in the app. For APIs, the most relevant are:
X-Content-Type-Options: nosniffReferrer-Policy: no-referrer(or a stricter policy appropriate to your needs)Content-Security-Policy(more relevant for HTML; for pure JSON APIs it can be minimal)Strict-Transport-Security(only when you serve exclusively over HTTPS)
# Example after-request hook
from flask import Flask
def add_security_headers(app: Flask):
@app.after_request
def _headers(resp):
resp.headers.setdefault("X-Content-Type-Options", "nosniff")
resp.headers.setdefault("Referrer-Policy", "no-referrer")
return respDebug disabled and safe error surfaces
- Ensure
FLASK_DEBUGis off in production. - Ensure interactive debugger is never exposed.
- Verify error responses don’t leak stack traces or internal configuration.
Dependency pinning and reproducible builds
- Pin direct and transitive dependencies (lockfile) to avoid surprise upgrades.
- Rebuild images from a clean environment in CI to ensure reproducibility.
- Track CVEs and patch regularly; automate dependency scanning where possible.
Smoke tests after release
Smoke tests are quick, automated checks that confirm the service is alive and key workflows work after deployment.
Practical step-by-step: minimal smoke test script
# smoke_test.sh
set -euo pipefail
BASE_URL="${BASE_URL:-https://api.example.com}"
# Liveness
curl -fsS "$BASE_URL/healthz" > /dev/null
# Readiness
curl -fsS "$BASE_URL/readyz" > /dev/null
# Basic API check (example)
curl -fsS "$BASE_URL/v1/ping" | grep -q "pong"Run smoke tests from the same network context as your users (or at least from your production environment) to catch DNS, TLS, routing, and auth misconfigurations.
Final pre-flight checklist (copy/paste)
- WSGI server configured (workers/threads/timeouts) and logs to stdout/stderr.
- All required env vars present; secrets injected via secret manager or mounted files.
- Startup config validation enabled; service fails fast on missing/invalid config.
- Health checks implemented:
/healthz(liveness) and/readyz(readiness). - Graceful shutdown verified (SIGTERM drains requests within deadline).
- Error reporting wired (optional) with environment/release tags and scrubbing.
- Migrations run as a single deployment step (not per replica) and follow expand/contract when needed.
- Security headers set (at least
nosniff, referrer policy; HSTS only with HTTPS). - Debug disabled; no stack traces exposed to clients.
- Dependencies pinned; builds reproducible; vulnerability scanning in place.
- Post-deploy smoke tests executed and monitored.