All courses > Technology and Programming > Backend development ::

Flask Essentials: Deployment Basics and Production Readiness Checks

Capítulo 14

Estimated reading time: 10 minutes

Production runtime: WSGI server choice and configuration

In production, you typically run Flask behind a dedicated WSGI server. The WSGI server is responsible for managing worker processes/threads, timeouts, and graceful restarts. Your Flask app should be exposed as a WSGI callable (for example, app), and the server imports it.

Gunicorn (common default)

Gunicorn is a widely used WSGI server for Linux. A typical pattern is to run multiple workers and set timeouts appropriate to your service.

# Install (pin in requirements/lockfile, see checklist below)
pip install gunicorn

# Example command
# -w: workers (start with 2-4 per CPU core for sync workers, then measure)
# -k: worker class (sync by default; use gthread for some concurrency)
# --timeout: hard timeout for requests
# --graceful-timeout: time to finish in-flight requests on restart
# --access-logfile/- --error-logfile: log to stdout/stderr in containers

gunicorn "myservice.wsgi:app" \
  -w 4 \
  -k gthread --threads 8 \
  --timeout 30 \
  --graceful-timeout 20 \
  --keep-alive 5 \
  --access-logfile - \
  --error-logfile -

Notes:

Worker model: sync is simplest; gthread can help for I/O-bound endpoints; for async stacks you’d typically use ASGI instead of WSGI.
Timeouts: keep them aligned with your load balancer/proxy timeouts to avoid half-open requests.
Logging: in containerized environments, log to stdout/stderr and let the platform collect logs.

uWSGI (alternative)

uWSGI is powerful but has more configuration surface area. If you use it, keep configuration explicit and versioned, and ensure you understand worker lifecycle and signal handling.

Expose a WSGI entrypoint

Keep a small module that creates the app and exposes it for the server to import.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

# myservice/wsgi.py
from myservice import create_app

app = create_app()

Environment variables and secret management

Production configuration should be injected at runtime, not committed to source control. Environment variables are the most portable mechanism across PaaS, containers, and VM deployments.

What belongs in env vars

Connection strings (DB URL), cache endpoints, external API base URLs.
Feature flags and operational toggles.
Secrets: signing keys, API tokens, OAuth client secrets (preferably via a secret manager).

Secret management patterns

Environment variables are often the delivery mechanism, but the source of truth should be a secret manager when possible.

Managed secret stores: AWS Secrets Manager/SSM, GCP Secret Manager, Azure Key Vault, Vault. Your runtime injects secrets as env vars or mounted files.
Mounted secret files: in Kubernetes, secrets can be mounted as files; your app reads from a path like /run/secrets/....
Rotation readiness: design for secret rotation by reloading on restart (most common) and keeping TTLs short where feasible.

Practical step-by-step: minimal secret loading

This pattern reads a secret from an env var first, then falls back to a file path if provided.

# myservice/secrets.py
import os
from pathlib import Path

class SecretError(RuntimeError):
    pass

def read_secret(name: str, *, file_var: str | None = None) -> str:
    val = os.getenv(name)
    if val:
        return val
    if file_var:
        p = os.getenv(file_var)
        if p:
            path = Path(p)
            if path.exists():
                return path.read_text(encoding="utf-8").strip()
    raise SecretError(f"Missing required secret: {name}")

Basic containerization patterns

Containerization standardizes runtime dependencies and makes deployments repeatable. The goal is a small image, predictable startup, and clear separation of build-time vs runtime configuration.

Dockerfile pattern (multi-stage, non-root, pinned deps)

# syntax=docker/dockerfile:1
FROM python:3.12-slim AS base

ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1

# System deps (keep minimal)
RUN apt-get update && apt-get install -y --no-install-recommends \
    ca-certificates \
  && rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Install dependencies first for better caching
COPY requirements.txt /app/requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . /app

# Create a non-root user
RUN useradd -r -u 10001 appuser && chown -R appuser:appuser /app
USER appuser

EXPOSE 8000

# Gunicorn as PID 1 (or use tini; see graceful shutdown section)
CMD ["gunicorn", "myservice.wsgi:app", "-w", "4", "-k", "gthread", "--threads", "8", "--bind", "0.0.0.0:8000", "--access-logfile", "-", "--error-logfile", "-", "--timeout", "30", "--graceful-timeout", "20"]

Keep environment-specific values out of the image. Inject them at runtime using your orchestrator or deployment system.

Runtime configuration via env vars

# Example runtime env vars
FLASK_ENV=production
APP_ENV=production
DATABASE_URL=postgresql+psycopg://...
SECRET_KEY=...
SENTRY_DSN=...
LOG_LEVEL=INFO

Health checks: liveness and readiness

Health checks allow your platform to route traffic only to healthy instances and to restart instances that are stuck. Use two endpoints with different semantics:

Liveness: “process is alive” (fast, no dependencies). If this fails, the instance should be restarted.
Readiness: “ready to serve traffic” (may include dependency checks like DB connectivity). If this fails, the instance should be removed from rotation but not necessarily restarted.

Practical endpoints

# myservice/health.py
from flask import Blueprint, jsonify

bp = Blueprint("health", __name__)

@bp.get("/healthz")
def healthz():
    return jsonify(status="ok"), 200

@bp.get("/readyz")
def readyz():
    # Keep this lightweight; avoid expensive queries.
    # Optionally: check DB connectivity with a simple SELECT 1.
    return jsonify(status="ready"), 200

Wire these endpoints into your routing and configure your load balancer/orchestrator to call them on a schedule.

Graceful shutdown and worker lifecycle

During deploys and autoscaling, instances receive termination signals. A graceful shutdown lets in-flight requests finish (within a deadline) and stops accepting new requests.

Key points

WSGI server responsibility: Gunicorn handles signals and worker draining; configure --graceful-timeout.
App responsibility: close resources cleanly (DB sessions, background threads) when the process exits.
PID 1 signal handling: in containers, ensure signals reach Gunicorn. If you wrap commands in shell scripts, use exec so the server becomes PID 1.

Container entrypoint tip

# If using a shell entrypoint, end with exec so signals propagate
exec gunicorn "myservice.wsgi:app" --bind 0.0.0.0:8000 ...

Config validation at startup (fail fast)

Production failures are easier to diagnose when the service refuses to start with invalid configuration. Validate required env vars, URL formats, and critical settings before accepting traffic.

Practical step-by-step: validate on app creation

# myservice/startup_checks.py
import os
from urllib.parse import urlparse

class ConfigError(RuntimeError):
    pass

def require_env(name: str) -> str:
    val = os.getenv(name)
    if not val:
        raise ConfigError(f"Missing env var: {name}")
    return val

def validate_database_url():
    db = require_env("DATABASE_URL")
    parsed = urlparse(db)
    if not parsed.scheme or not parsed.netloc:
        raise ConfigError("DATABASE_URL is not a valid URL")

def validate_production_flags():
    debug = os.getenv("FLASK_DEBUG", "0")
    if debug not in ("0", "false", "False", ""):
        raise ConfigError("FLASK_DEBUG must be disabled in production")

def run_startup_checks():
    validate_database_url()
    validate_production_flags()

# myservice/__init__.py (inside create_app)
from .startup_checks import run_startup_checks

def create_app():
    run_startup_checks()
    ...
    return app

Keep startup checks deterministic and fast. If you need dependency checks (DB reachable), prefer readiness checks so deploys don’t fail due to transient dependency outages—unless your policy is to fail fast on missing dependencies.

Production logging and error reporting integration points

In production, logs should be structured, consistent, and correlated with requests. Error reporting should capture stack traces and context without leaking secrets.

Logging in production: practical considerations

Write to stdout/stderr: let the platform ship logs.
Include request correlation: propagate a request ID from the edge (or generate one) and include it in logs.
Separate access logs: Gunicorn access logs can be sufficient; ensure they include latency and status codes.
PII/secret hygiene: never log raw tokens, passwords, or full authorization headers.

Error reporting integration points

Error reporting tools (e.g., Sentry, Rollbar, Bugsnag) typically integrate at two levels:

WSGI middleware: captures unhandled exceptions and request context.
Flask integration: hooks into Flask’s exception handling and can attach user/request metadata.

Operational pattern:

Enable error reporting only when a DSN/key is present.
Set environment/release version tags to correlate errors with deployments.
Scrub sensitive fields (headers, payload keys like password, token).

# Pseudocode wiring (library-specific)
DSN = os.getenv("SENTRY_DSN")
if DSN:
    init_error_reporting(dsn=DSN, environment=os.getenv("APP_ENV"), release=os.getenv("GIT_SHA"))

Running migrations safely during deployment

Schema migrations are part of deployment, but they can cause downtime if applied unsafely. The safest approach depends on your release strategy (rolling deploys, blue/green) and database constraints.

Principles for safe migrations

One-way compatibility: deploy code that can run against both the old and new schema during a rolling update (expand/contract pattern).
Separate migration step: run migrations as a distinct job before (or during) rollout, not inside every app instance startup.
Idempotency: migration command should be safe to run once; avoid concurrent runners.
Locking awareness: large table changes can lock; schedule or use online migration techniques where supported.

Practical step-by-step: deployment pipeline migration job

Step	Action	Notes
1	Build image	Same artifact used for migration job and app rollout.
2	Run migrations as a one-off job	Ensure only one runner (CI job, Kubernetes Job with concurrency policy, etc.).
3	Deploy application	Rolling update/blue-green with readiness checks.
4	Post-deploy smoke tests	Hit key endpoints and verify critical flows.

# Example command in CI/CD (tooling-specific)
flask db upgrade

If you must run migrations at startup (not recommended for multi-replica services), implement a distributed lock (DB advisory lock) and keep migrations fast; otherwise you risk multiple instances racing or blocking startup.

Production readiness checklist

Security headers basics

Set baseline security headers at the edge (reverse proxy) or in the app. For APIs, the most relevant are:

X-Content-Type-Options: nosniff
Referrer-Policy: no-referrer (or a stricter policy appropriate to your needs)
Content-Security-Policy (more relevant for HTML; for pure JSON APIs it can be minimal)
Strict-Transport-Security (only when you serve exclusively over HTTPS)

# Example after-request hook
from flask import Flask

def add_security_headers(app: Flask):
    @app.after_request
    def _headers(resp):
        resp.headers.setdefault("X-Content-Type-Options", "nosniff")
        resp.headers.setdefault("Referrer-Policy", "no-referrer")
        return resp

Debug disabled and safe error surfaces

Ensure FLASK_DEBUG is off in production.
Ensure interactive debugger is never exposed.
Verify error responses don’t leak stack traces or internal configuration.

Dependency pinning and reproducible builds

Pin direct and transitive dependencies (lockfile) to avoid surprise upgrades.
Rebuild images from a clean environment in CI to ensure reproducibility.
Track CVEs and patch regularly; automate dependency scanning where possible.

Smoke tests after release

Smoke tests are quick, automated checks that confirm the service is alive and key workflows work after deployment.

Practical step-by-step: minimal smoke test script

# smoke_test.sh
set -euo pipefail
BASE_URL="${BASE_URL:-https://api.example.com}"

# Liveness
curl -fsS "$BASE_URL/healthz" > /dev/null

# Readiness
curl -fsS "$BASE_URL/readyz" > /dev/null

# Basic API check (example)
curl -fsS "$BASE_URL/v1/ping" | grep -q "pong"

Run smoke tests from the same network context as your users (or at least from your production environment) to catch DNS, TLS, routing, and auth misconfigurations.

Final pre-flight checklist (copy/paste)

WSGI server configured (workers/threads/timeouts) and logs to stdout/stderr.
All required env vars present; secrets injected via secret manager or mounted files.
Startup config validation enabled; service fails fast on missing/invalid config.
Health checks implemented: /healthz (liveness) and /readyz (readiness).
Graceful shutdown verified (SIGTERM drains requests within deadline).
Error reporting wired (optional) with environment/release tags and scrubbing.
Migrations run as a single deployment step (not per replica) and follow expand/contract when needed.
Security headers set (at least nosniff, referrer policy; HSTS only with HTTPS).
Debug disabled; no stack traces exposed to clients.
Dependencies pinned; builds reproducible; vulnerability scanning in place.
Post-deploy smoke tests executed and monitored.

Now answer the exercise about the content:

In a production setup with liveness and readiness endpoints, what should the platform do when the readiness check fails but the liveness check still succeeds?

You are right! Congratulations, now go to the next page

You missed! Try again.

Liveness means the process is alive; failing it should trigger a restart. Readiness means the instance is ready to serve traffic; failing it should stop traffic routing to that instance, but it may not need a restart.

100%

Flask Essentials: Practical Backend Patterns for Small Services

New course

14 pages