All courses > Technology and Programming > Cloud Computing and Web Servers ::

Reference Architecture: Putting an API Gateway in Front of Microservices

Capítulo 9

Estimated reading time: 9 minutes

Diagram Narrative: Clients → Gateway → Microservices

Use the API gateway as the single, stable entry point for clients, while microservices remain independently deployable behind it. Think of the request flow as a layered pipeline:

Clients (web app, mobile app, partner systems) call a small set of public endpoints.
API Gateway receives all external traffic, applies cross-cutting policies, and routes requests to the right upstream.
Microservices implement business capabilities (orders, catalog, payments) and expose internal APIs optimized for service-to-service communication.

A concrete narrative example: a mobile client calls GET /api/orders/123. The gateway authenticates the caller, applies per-route limits, selects an upstream instance of the Orders service, forwards the request, and returns a normalized response. The Orders service may call other internal services (e.g., Catalog) without exposing those internal calls to the client.

Reference Architecture Blueprint (Step-by-Step)

Step 1: Define the boundary between external and internal APIs

Start by deciding which APIs are external (consumed by clients outside your cluster) and which are internal (consumed only by services). A common baseline:

External APIs: go through the gateway; stable paths; client-friendly payloads.
Internal APIs: stay inside the network; optimized for service-to-service needs; may use different protocols (HTTP/gRPC) and evolve faster.

Practical rule: if an endpoint is used by a browser/mobile/partner, treat it as external and put it behind the gateway. If it is only used by other services, keep it internal and avoid routing it through the gateway unless you have a specific reason (e.g., centralized mTLS termination or consistent auditing).

Step 2: Assign responsibilities: gateway vs services

Keep the gateway focused on cross-cutting concerns and keep microservices focused on business logic. This separation prevents the gateway from turning into a “mega-service” and keeps services testable and portable.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Gateway responsibilities (cross-cutting): request admission control, edge security enforcement, routing, protocol bridging, response shaping for clients, and resilience controls at the edge.
Service responsibilities (business): domain rules, data ownership, workflows, invariants, and service-level authorization decisions that depend on business state (e.g., “can this user cancel this order given its status?”).

Practical check: if a rule requires reading domain data (order status, account tier stored in a database), it belongs in the service. If a rule can be enforced without domain state (basic identity checks, per-route limits, header normalization), it belongs at the gateway.

Step 3: Establish upstream selection and service discovery

Microservices scale horizontally, so the gateway must know which instance to send a request to. Conceptually, this is upstream selection, and it is typically powered by service discovery.

Service discovery: a registry of healthy service instances (e.g., Orders has 10 instances). Instances register themselves and report health.
Upstream selection: the gateway chooses one instance per request using a load-balancing strategy (round-robin, least connections, latency-aware) and health information.

Step-by-step approach:

3.1 Decide how services are addressed: by stable service name (e.g., orders) rather than fixed IPs.
3.2 Ensure each service exposes a health endpoint or health signal used by the platform.
3.3 Configure the gateway to resolve service names via your environment (Kubernetes service DNS, a registry, or a mesh control plane).
3.4 Define load-balancing behavior and failure handling (e.g., remove unhealthy instances quickly).

Design note: keep discovery and selection “boring.” Prefer platform-native discovery (like Kubernetes Services) unless you have a strong reason to introduce a separate registry.

Step 4: Decide when to introduce a BFF (Backend for Frontend)

A BFF is a specialized backend layer tailored to a specific client type (e.g., mobile vs web). It can live behind the gateway and is useful when different clients need different aggregation, caching, or response shapes.

Introduce a BFF when at least one of these is true:

Client-specific orchestration: the client would otherwise call many endpoints and stitch results together.
Different product needs: web and mobile require different fields, pagination, or performance tradeoffs.
Security partitioning: partner APIs must be isolated from consumer APIs with different contracts and controls.

Keep the gateway generic and thin; put client-specific composition in BFFs. A common layout:

Gateway routes /api/mobile/* to mobile-bff and /api/web/* to web-bff.
BFF calls multiple internal services (Orders, Catalog, Profile) and returns a client-optimized response.

Step 5: Apply resilience patterns at the gateway edge

The gateway is a critical choke point. Resilience controls here protect both clients (faster failures, predictable behavior) and services (reduced overload). The goal is not to “hide” failures, but to fail safely and avoid cascading outages.

Timeouts (always set them)

Timeouts prevent requests from hanging and consuming resources indefinitely.

Gateway-to-upstream timeout: how long the gateway waits for a service response.
Client-to-gateway timeout: often controlled by clients, but the gateway should respond quickly when upstream is unhealthy.

Practical guidance:

Set per-route timeouts based on expected latency (e.g., GET /catalog might be 300–800ms; POST /payments might be 2–5s depending on providers).
Keep timeouts shorter at the edge than deep inside the system to avoid request pileups.

Retries (use with caution)

Retries can improve success rates for transient failures, but they can also multiply load and make outages worse.

Retry only safe operations: typically idempotent reads (GET) and carefully designed idempotent writes.
Bound retries: small retry count (often 1) with jittered backoff.
Avoid retry storms: do not retry when the upstream is clearly overloaded or failing consistently.

Practical approach:

Enable retries for a small set of read routes where transient network errors are common.
For writes, prefer idempotency keys and service-side deduplication rather than gateway retries.

Circuit breaking (conceptual placement)

A circuit breaker stops sending traffic to an upstream that is failing beyond a threshold, allowing it to recover and preventing resource exhaustion.

Closed: normal operation; failures are tracked.
Open: requests are rejected quickly (fast-fail) for a cool-down period.
Half-open: limited test requests are allowed to see if recovery occurred.

At the gateway, circuit breaking is especially useful for:

Protecting downstream services from spikes when they are already unhealthy.
Providing consistent error responses quickly instead of long timeouts.

Graceful degradation (serve something useful)

Graceful degradation means returning a partial or simplified response when a dependency is down, rather than failing the entire request.

Common patterns in a gateway-fronted microservices setup:

Static or cached fallback: return cached catalog data if the Catalog service is temporarily unavailable.
Feature shedding: omit optional sections (e.g., recommendations) while still returning the core resource.
Default responses: return an empty list for non-critical widgets rather than a 500.

Where to implement it:

Gateway-level: good for simple, generic fallbacks (static responses, cached responses, or consistent error mapping).
BFF-level: best for partial responses and composition logic (because it understands what is optional for that client).
Service-level: best when fallback depends on domain rules or alternative data sources.

Handling Internal vs External APIs in the Route Design

In the reference architecture, treat the gateway as the boundary for external APIs, and keep internal APIs reachable only within the private network.

External routes: stable, documented, and protected. They map to one service or to a BFF.
Internal routes: not exposed publicly; services call each other directly using internal DNS/service names.

Practical safeguard: ensure internal services are not accidentally exposed by requiring an explicit “public exposure” flag in gateway configuration, and default everything to private.

Sample Route Table (Gateway Configuration View)

The following route table illustrates a typical setup where some routes map directly to services and others go to BFFs. It also shows where different timeouts and resilience settings might vary by route.

Route ID: catalog-public-read  Method: GET   Path: /api/catalog/*        Upstream: catalog-svc     Timeout: 800ms  Retry: 1 (network errors)  Circuit: enabled  Exposure: external
Route ID: orders-read          Method: GET   Path: /api/orders/*         Upstream: orders-svc      Timeout: 1200ms Retry: 0                  Circuit: enabled  Exposure: external
Route ID: orders-create        Method: POST  Path: /api/orders           Upstream: orders-svc      Timeout: 2000ms Retry: 0                  Circuit: enabled  Exposure: external
Route ID: payments-charge      Method: POST  Path: /api/payments/charge  Upstream: payments-svc    Timeout: 5000ms Retry: 0                  Circuit: enabled  Exposure: external
Route ID: mobile-bff           Method: ANY   Path: /api/mobile/*         Upstream: mobile-bff      Timeout: 2000ms Retry: 0                  Circuit: enabled  Exposure: external
Route ID: web-bff              Method: ANY   Path: /api/web/*            Upstream: web-bff         Timeout: 1500ms Retry: 0                  Circuit: enabled  Exposure: external
Route ID: partner-orders       Method: GET   Path: /partner/orders/*     Upstream: partner-bff     Timeout: 2000ms Retry: 0                  Circuit: enabled  Exposure: external

Notes on using this table:

Different routes have different latency expectations; set timeouts accordingly.
Retries are limited and primarily used for safe reads.
BFF routes are treated as first-class upstreams; they own composition and client-specific degradation.

Sample Policy Matrix (Auth, Limits, Transforms per Route)

This matrix is a planning tool: for each route, decide which policies apply. Keep it explicit so you can review changes during incidents and audits. The details of how each policy is implemented are gateway-specific; the point here is the per-route intent.

| Route ID             | AuthN Mechanism | AuthZ Model            | Limits (rate/quota)        | Transforms (req/resp)                 | Notes |
|---------------------|-----------------|------------------------|----------------------------|---------------------------------------|------|
| catalog-public-read | JWT (user)      | scopes: catalog:read   | per-user + per-IP          | normalize headers; cache hints        | allow fallback to cached response |
| orders-read         | JWT (user)      | scopes: orders:read    | per-user                   | add correlation headers; field filter | no retry; strict timeout |
| orders-create       | JWT (user)      | scopes: orders:write   | per-user + burst control   | validate content-type; map errors     | idempotency key recommended |
| payments-charge     | JWT (user)      | scopes: payments:write | per-user + strict bursts   | redact sensitive fields in response   | longest timeout; no retry |
| mobile-bff          | JWT (user)      | scopes per endpoint    | per-user                   | client-specific shaping in BFF        | BFF handles partial responses |
| web-bff             | JWT (user)      | scopes per endpoint    | per-user                   | client-specific shaping in BFF        | separate caching strategy |
| partner-orders      | OAuth2 (client) | partner roles/scopes   | per-client (contract-based)| schema translation; strict versioning | isolate partner traffic |

How to use the matrix step-by-step:

1 List every external route and its upstream (service or BFF).
2 For each route, decide the identity type (end-user vs machine client) and map it to an authentication mechanism.
3 Define authorization intent (scopes/roles) at the route level, and keep business-state checks inside services.
4 Assign limits based on risk and cost (payments and partner routes are typically stricter).
5 Document required transforms (compatibility, redaction, schema mapping) so clients remain stable while services evolve.
6 Add resilience settings (timeouts, retry rules, circuit behavior) per route to avoid one-size-fits-all defaults.

Now answer the exercise about the content:

Which guideline best describes where to enforce rules that require reading domain data (such as order status or account tier)?

You are right! Congratulations, now go to the next page

You missed! Try again.

Rules that require domain data (like order status from a database) belong in the service. The gateway should focus on cross-cutting concerns that don’t depend on business state.