Free Ebook cover FastAPI for Beginners: Build a Production-Ready REST API

FastAPI for Beginners: Build a Production-Ready REST API

New course

14 pages

Deployment Considerations for a Production-Ready FastAPI REST API

Capítulo 14

Estimated reading time: 8 minutes

+ Exercise

What “production-ready” means for deployment

A production deployment is less about adding new features and more about running the same API reliably under real traffic. That typically means: a stable ASGI server configuration, predictable logging, controlled cross-origin access, health/readiness endpoints for orchestration, safe handling of environment variables and secrets, resilient database connectivity, and operational safeguards (timeouts, monitoring, and controlled startup tasks).

Choose an ASGI server setup (Uvicorn vs Gunicorn workers)

Common deployment patterns

  • Uvicorn only: simplest, often fine for small services or when a platform manages process scaling for you.
  • Gunicorn + Uvicorn workers: common for Linux servers and containers; Gunicorn manages multiple worker processes, Uvicorn runs the ASGI app inside each worker.

Recommended: Gunicorn with Uvicorn workers

This setup gives you multi-process concurrency (useful for CPU-bound work and better isolation) while still supporting async endpoints.

# Install server dependencies (example in requirements.txt or poetry deps): gunicorn uvicorn[standard]

Example command:

gunicorn "app.main:app" -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000 --workers 4 --timeout 60 --graceful-timeout 30 --keep-alive 5

How to pick worker counts

A practical starting point is workers = 2 * CPU + 1, then load test and adjust. If your endpoints are mostly I/O-bound (database/network), fewer workers may still perform well. If you run heavy CPU tasks inside requests, consider moving them out (e.g., to a job queue) rather than scaling workers indefinitely.

Timeouts and keep-alive

  • --timeout: maximum time a worker can take to respond. Set it high enough for legitimate slow requests, but low enough to avoid stuck workers.
  • --graceful-timeout: time to allow workers to finish ongoing requests during shutdown.
  • --keep-alive: keep-alive seconds for HTTP connections; tune based on your load balancer and client behavior.

Configure logging for production

Goals for production logging

  • Structured logs (JSON) or at least consistent formatting.
  • Correlation via request IDs (so you can trace a request across services).
  • Correct levels (INFO in normal operation, WARNING/ERROR for problems).
  • No secrets (never log tokens, passwords, raw authorization headers).

Basic logging configuration (Python logging)

In production, prefer logging to stdout/stderr so containers and platforms can collect logs. You can configure Uvicorn/Gunicorn logging via config files, but a simple app-level setup is often enough.

Continue in our app.

You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.

Or continue reading below...
Download App

Download the app

import logging
import sys

LOG_LEVEL = "INFO"

logging.basicConfig(
    level=LOG_LEVEL,
    stream=sys.stdout,
    format="%(asctime)s %(levelname)s %(name)s %(message)s",
)

logger = logging.getLogger("app")

Add a request ID (middleware)

Request IDs help connect application logs to reverse proxy logs and error monitoring events.

import uuid
from fastapi import Request
from starlette.middleware.base import BaseHTTPMiddleware

class RequestIdMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next):
        request_id = request.headers.get("X-Request-ID") or str(uuid.uuid4())
        response = await call_next(request)
        response.headers["X-Request-ID"] = request_id
        return response

When logging, include the request ID (for example, by attaching it to request.state and using a logging filter). Keep it simple at first; the key is consistency.

Handle CORS safely

Why CORS matters in production

CORS controls which browser-based frontends can call your API. In production, overly permissive CORS (like allowing all origins with credentials) can create security issues.

Practical configuration

Use CORSMiddleware and restrict origins to known frontend domains.

from fastapi.middleware.cors import CORSMiddleware

allowed_origins = [
    "https://app.example.com",
    "https://admin.example.com",
]

app.add_middleware(
    CORSMiddleware,
    allow_origins=allowed_origins,
    allow_credentials=True,
    allow_methods=["GET", "POST", "PUT", "PATCH", "DELETE"],
    allow_headers=["Authorization", "Content-Type", "X-Request-ID"],
)
  • If you do not need cookies/credentialed requests, set allow_credentials=False.
  • Avoid allow_origins=["*"] in production unless the API is truly public and you understand the implications.

Define health and readiness endpoints

Health vs readiness

  • Liveness/health: “Is the process running?” Used to restart crashed or deadlocked instances.
  • Readiness: “Can this instance serve traffic?” Often checks dependencies like database connectivity.

Implement endpoints

Keep these endpoints fast and stable. Avoid expensive checks on every call.

from fastapi import APIRouter

router = APIRouter()

@router.get("/health")
async def health():
    return {"status": "ok"}

@router.get("/ready")
async def ready():
    # Example: lightweight dependency check
    # e.g., run a simple SELECT 1 with a short timeout
    return {"status": "ready"}

Readiness checks with the database (pattern)

In a deployed environment, database connectivity can fail temporarily (startup ordering, network policies, credential rotation). A readiness check should:

  • Use a short timeout.
  • Return non-200 if the DB is unavailable.
  • Not run migrations or schema changes.
# Pseudocode pattern
# try:
#   await db.execute("SELECT 1")
#   return {"status": "ready"}
# except Exception:
#   raise HTTPException(status_code=503, detail="db_unavailable")

Containerization basics (Docker)

Why containers help

Containers package your app and its runtime dependencies into a consistent artifact. This reduces “works on my machine” issues and makes scaling and rollbacks easier.

Minimal Dockerfile (production-oriented)

This example uses a slim Python base image, installs dependencies, and runs Gunicorn with Uvicorn workers.

FROM python:3.12-slim

ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1

WORKDIR /app

# System deps (adjust for your DB driver needs)
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt /app/
RUN pip install --no-cache-dir -r requirements.txt

COPY . /app/

EXPOSE 8000

CMD ["gunicorn", "app.main:app", "-k", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000", "--workers", "4", "--timeout", "60"]

.dockerignore essentials

__pycache__/
.pytest_cache/
.venv/
.env
.git/
*.log

Environment variables and secrets in deployed environments

What belongs in environment variables

  • Database URL / credentials
  • JWT secret keys / signing keys (or references to a secret manager)
  • CORS allowed origins
  • Log level
  • External service URLs (email provider, payment gateway, etc.)

Practical guidelines

  • Never bake secrets into images. Inject them at runtime (Kubernetes secrets, platform config vars, secret managers).
  • Fail fast if required env vars are missing. A container that starts “half-configured” is harder to debug.
  • Separate config from code: the same image should run in staging and production with different env vars.

Database connectivity in production

Connection pooling and limits

In production, the database is often the bottleneck. Ensure you use a connection pool and set sensible limits so multiple app instances do not overwhelm the DB.

  • Set pool size and overflow based on DB capacity and number of app replicas.
  • Use server-side timeouts (statement timeout) where possible.
  • Prefer short-lived transactions; avoid holding connections while doing slow work.

Deployed network realities

  • DNS and network policies can cause intermittent failures; implement retries where appropriate (but avoid retrying non-idempotent writes blindly).
  • Use TLS to the database when crossing untrusted networks.
  • Be careful with “startup ordering”: your API may start before the DB is ready. Use readiness probes and retry logic in connection initialization.

Operational concerns: timeouts, proxies, and request limits

Reverse proxy headers

In many deployments, a reverse proxy (NGINX, Traefik, a cloud load balancer) sits in front of your app. Ensure your app/server respects forwarded headers so client IPs and schemes are correct.

  • Configure Uvicorn/Gunicorn to trust proxy headers only from known proxies.
  • Set maximum request body sizes at the proxy to protect against large payload attacks.

Application-level timeouts

Even with server timeouts, you should also consider timeouts for outbound calls (HTTP to other services, database operations). A single slow dependency can exhaust worker capacity.

  • Set explicit timeouts on HTTP clients.
  • Use database statement timeouts where supported.
  • Consider circuit breakers for flaky dependencies.

Error monitoring hooks (observability)

What to capture

  • Unhandled exceptions (stack traces)
  • Request context (path, method, request ID, user ID if available)
  • Performance signals (latency, slow endpoints)

Integrate an error monitoring SDK (pattern)

Most monitoring tools provide an ASGI middleware or FastAPI integration. The typical pattern is:

  • Initialize the SDK at startup using environment variables (DSN, environment name).
  • Attach middleware to capture exceptions and performance traces.
  • Scrub sensitive data (Authorization headers, tokens, PII).
# Pseudocode
# monitoring.init(dsn=..., environment=..., traces_sample_rate=...)
# app.add_middleware(monitoring.ASGIMiddleware)

Safe startup tasks (migrations and initialization)

Why “run migrations on startup” can be risky

Automatically applying migrations when the API starts can cause problems:

  • Multiple replicas may attempt migrations concurrently.
  • A failed migration can crash-loop all instances.
  • Schema changes might require coordination (backward-compatible deploys).

Controlled migration strategies

  • Run migrations as a separate job in your deployment pipeline (preferred).
  • Kubernetes Job or one-off task before rolling out new replicas.
  • Leader election/lock if you must run migrations from the app (advanced; still risky).

Startup checks that are safe

Safe startup tasks are those that do not mutate shared state:

  • Validate required environment variables are present.
  • Warm up caches (optional) without blocking readiness too long.
  • Preload configuration and compile regexes/templates.

Deployment checklist (tie-in to configuration, database, auth, and testing)

AreaChecklist itemWhat to verify
ASGI serverGunicorn/Uvicorn configurationWorkers sized for CPU, timeouts set, graceful shutdown enabled, binds to correct host/port
LoggingStructured/consistent logsLogs to stdout, correct log level, request IDs present, no secrets logged
CORSRestricted originsOnly trusted domains allowed, credentials only if needed, headers/methods minimal
Health/ReadinessEndpoints implemented/health returns quickly, /ready checks critical deps with short timeouts, used by orchestration probes
ConfigurationEnvironment variables injectedProd/staging values correct, required vars validated at startup, secrets not in image or repo
DatabaseConnectivity and poolingDB URL correct, TLS if needed, pool sizes tuned, statement timeouts set, readiness reflects DB availability
MigrationsControlled executionMigrations run as a separate step/job, not concurrently by multiple replicas, rollback plan exists
AuthenticationKey material and token settingsJWT secrets/keys loaded securely, token lifetimes appropriate, clock skew considered, auth failures logged safely
Operational limitsTimeouts and request limitsServer timeout aligned with proxy timeout, outbound HTTP timeouts set, max body size enforced at proxy
MonitoringError reporting integratedUnhandled exceptions captured, sensitive data scrubbed, environment tags set, alerting configured
TestingPre-deploy verificationCI tests pass, smoke tests hit /health and key endpoints, migration tested in staging, load test baseline recorded

Now answer the exercise about the content:

Which approach best matches a production-ready strategy for database readiness checks in a FastAPI deployment?

You are right! Congratulations, now go to the next page

You missed! Try again.

A readiness endpoint should quickly verify critical dependencies (like the database) using a short timeout and signal unavailability with a non-200 response. It should not run migrations or expensive checks.

Download the app to earn free Certification and listen to the courses in the background, even with the screen off.