Gateway placement in serverless and event-driven systems
In a serverless backend, your “compute” is typically a set of functions (for example, cloud functions) that scale by creating concurrent executions. An API gateway sits at the edge to terminate HTTP(S), apply request-level controls, and invoke the right function (or workflow) based on the route. The gateway is also the boundary where you standardize cross-cutting behavior (auth checks, request normalization, throttling, caching) without duplicating it in every function.
Event-driven systems add another dimension: not every operation is a synchronous HTTP request. Some endpoints may enqueue work (publish an event) and return immediately, while downstream processing happens asynchronously. The gateway still provides the HTTP façade, but the backend target might be a function, a queue/topic, or a workflow/orchestrator that triggers functions.
Common placements
HTTP → Gateway → Function: best for request/response APIs (CRUD, queries, small commands) where the caller expects an immediate result.
HTTP → Gateway → Workflow/Orchestrator → Functions: best when a single request triggers multiple steps (validation, payment, fulfillment) and you want retries and state management outside a single function execution.
HTTP → Gateway → Queue/Topic → Functions: best for “fire-and-forget” commands, burst absorption, and decoupling. The HTTP response typically returns an acknowledgment and a tracking ID.
Continue in our app.
You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.
Or continue reading below...Download the app
Synchronous HTTP-to-function invocation
In the synchronous pattern, the gateway receives an HTTP request and invokes a function, then maps the function output back to an HTTP response. The key architectural decision is the contract between gateway and function: what the function receives (raw HTTP event vs normalized payload) and what it must return (status code, headers, body, and optional binary encoding flags).
Step-by-step: designing a synchronous route
1) Define the public route: choose method + path (for example,
GET /orders/{orderId}).2) Choose the integration style: either pass the full HTTP request context to the function (headers, query, path params) or have the gateway transform it into a simpler JSON payload.
3) Decide response mapping: standardize error shapes (for example,
{"error":"...","requestId":"..."}) and ensure the gateway maps function errors to correct HTTP status codes.4) Set timeouts: align gateway timeout with function timeout and expected latency. If a function can exceed the gateway timeout, switch to an async pattern (queue/workflow) for that operation.
5) Validate payload size and content type: enforce maximum sizes and accepted media types at the gateway to protect function concurrency and cost.
Example: normalized event payload to the function
Instead of passing the entire HTTP request, the gateway can send a compact payload that is stable across clients:
{ "request": { "id": "${requestId}", "method": "GET", "path": "/orders/123", "params": { "orderId": "123" }, "query": { "expand": "items" }, "headers": { "accept": "application/json" } }, "auth": { "principalId": "user-42", "scopes": ["orders:read"] }}This reduces coupling to gateway-specific event formats and makes unit testing functions easier.
Mapping templates and transformations (serverless-specific considerations)
Serverless functions are often small and single-purpose. Mapping at the gateway helps keep functions focused by handling repetitive “edge” tasks: normalizing headers, converting query strings to typed values, and shaping responses consistently.
When to transform at the gateway vs in the function
Transform at the gateway when it is purely protocol/contract work (content-type normalization, consistent error envelope, header rewriting, injecting request IDs, shaping a stable input model).
Transform in the function when it is domain logic (validation rules tied to business meaning, conditional behavior based on user data).
Step-by-step: request mapping pattern
1) Normalize headers: ensure canonical names (for example, always provide
x-request-id), strip hop-by-hop headers, and optionally remove untrusted client headers.2) Normalize identity context: convert gateway auth output into a consistent
authobject for all functions.3) Normalize input types: convert numeric query params to numbers, booleans to true/false, and default missing values.
4) Enforce schema where supported: reject malformed JSON early to avoid consuming function concurrency.
Response mapping pattern
Have functions return a structured object, then let the gateway translate it into HTTP:
{ "statusCode": 200, "headers": { "content-type": "application/json" }, "body": { "orderId": "123", "status": "PAID" }}The gateway can serialize body to JSON, add standard headers (for example, cache-control), and map known error types to status codes without repeating that logic in every function.
Binary and large payload considerations
Serverless APIs frequently handle uploads (images, PDFs) or downloads (reports). Two constraints matter: gateway payload limits and function memory/time. If you push large bodies through synchronous function invocations, you risk timeouts, high memory usage, and high cost.
Binary payload handling
Binary media types: configure the gateway to treat specific content types (for example,
image/png,application/pdf) as binary so it doesn’t corrupt bytes during transformations.Encoding contract: many gateway-function integrations represent binary bodies as base64 strings plus a flag (for example,
isBase64Encoded). Ensure every function follows the same contract for both requests and responses.
Large upload/download patterns
Prefer direct-to-object-storage uploads: the gateway issues a short-lived upload authorization (or pre-signed URL) via a small function, and the client uploads directly to storage. This avoids routing large bodies through the gateway and functions.
Chunking or multipart: if clients must upload via HTTP API, use multipart support where available and keep the gateway’s max body size in mind.
Async report generation: for large downloads, return
202 Acceptedwith a job ID, generate the report asynchronously, and let the client fetch it from storage when ready.
Step-by-step: safe file upload flow
1) Client requests upload session:
POST /uploadswith metadata (filename, content-type, size).2) Gateway invokes a small function that validates metadata and returns an upload URL/token.
3) Client uploads directly to storage using the returned URL.
4) Storage event triggers processing (virus scan, thumbnail generation) via an event-driven function.
5) Client checks status:
GET /uploads/{id}served by a lightweight function reading status from a database.
Authentication choices common in serverless setups
Serverless APIs often mix different client types: browser apps, mobile apps, machine-to-machine integrations, and internal services. A practical serverless approach is to keep authentication decisions at the gateway and pass a compact identity context to functions.
Typical choices and when they fit
Managed identity integration (gateway-native authorizers/identity providers): good default for public APIs because it centralizes token verification and reduces per-request function overhead.
Custom authorizer function: useful when you need bespoke logic (multi-tenant rules, legacy token formats). Be careful: authorizer invocations also consume concurrency and add latency; cache authorizer results if supported.
Service-to-service identity: for internal routes, prefer short-lived credentials and gateway-enforced identity context so functions don’t need to fetch secrets on every call.
Design tip: keep the function’s authorization checks focused on resource-level decisions (for example, “can this user access order 123?”) while the gateway handles token parsing/verification and coarse route protection.
Per-route throttling to control concurrency and cost
In serverless, “too much traffic” doesn’t just degrade performance; it can explode concurrency, saturate downstream dependencies, and increase cost quickly. Per-route throttling at the gateway lets you allocate concurrency budget intentionally: protect expensive endpoints, keep cheap reads fast, and prevent a single route from starving others.
Practical approach: concurrency budgeting by route
Classify endpoints: label routes as cheap (cacheable reads), moderate (simple writes), expensive (PDF generation, complex queries).
Set per-route limits: apply stricter throttles to expensive routes and more generous limits to cheap ones.
Align with downstream capacity: if a function calls a database with a known connection limit, set the route throttle so peak concurrency stays within what the database can handle.
Use separate functions for separate cost profiles: splitting endpoints (next section) makes it easier to apply different throttles and timeouts.
Step-by-step: implementing per-route throttling
1) Identify the “blast radius”: which downstreams are shared (database, third-party API) and what their safe request rate is.
2) Pick limits per route: start conservative for expensive routes; increase after observing real latency and error rates.
3) Add client guidance: return retry-friendly responses (for example, include
retry-afterwhen throttled) so clients back off predictably.4) Separate internal vs external traffic: if internal jobs call the same gateway, give them distinct routes or stages with their own throttles.
Caching where available
Gateway caching can reduce function invocations for read-heavy endpoints, lowering latency and cost. It is most effective for idempotent GET requests with stable responses and clear cache keys.
What to cache
Reference data: product catalogs, configuration, public metadata.
Read endpoints with predictable query parameters: for example,
GET /products?category=....Auth-aware caching: only if you can include identity/tenant in the cache key to avoid data leaks.
Step-by-step: safe caching setup
1) Define cache key components: path + relevant query params + (if needed) tenant/user identifier.
2) Set TTL by data volatility: seconds for frequently changing data, minutes for stable lists.
3) Ensure correct headers: have the gateway add/forward
cache-controland vary behavior based on content negotiation if applicable.4) Avoid caching error responses: unless you explicitly want to dampen repeated failures.
Splitting endpoints across multiple functions
A common serverless reference architecture is “one function per route” or “one function per bounded capability.” This keeps deployments smaller and lets you tune memory/timeouts and throttles per endpoint. The trade-off is more moving pieces, so you need consistent conventions (input model, error model, logging fields).
Patterns
One function per endpoint: simplest operationally for independent scaling and least coupling. Good when endpoints have very different dependencies or performance profiles.
One function per resource: group related routes (for example,
/ordersGET/POST and/orders/{id}GET/PATCH) into a single function with an internal router. Good when shared code is significant.Command vs query split: separate read functions from write functions to tune caching and throttling differently.
Step-by-step: choosing a split
1) List routes and dependencies: which routes call the same database tables, third-party APIs, or require heavy libraries.
2) Identify “hot” vs “heavy” routes: hot = high QPS, heavy = high CPU/memory/time.
3) Split to isolate heavy routes: keep heavy routes in their own functions so they don’t force high memory settings on everything.
4) Standardize shared utilities: shared auth context parsing, validation helpers, and response helpers in a common library layer.
Organizing stages and environments
Serverless systems benefit from clear separation of environments because deployments are frequent and infrastructure is code-driven. A typical setup uses multiple gateway stages (or separate gateways) mapped to environments like dev, staging, and production, each pointing to environment-specific function versions and data stores.
Recommended structure
Separate accounts/projects for production vs non-production when possible to reduce accidental impact.
Consistent stage naming:
dev,staging,prod.Environment-specific configuration: function environment variables and gateway stage variables for backend URLs, feature flags, and resource identifiers.
Stable base path mapping: for example,
api.example.comfor prod andstaging-api.example.comfor staging, or/v1plus stage-specific hostnames.
Step-by-step: promoting changes across stages
1) Deploy functions to dev with a new version/alias and run automated tests through the dev gateway stage.
2) Promote the same artifact to staging (avoid rebuilding) and run integration tests with staging data dependencies.
3) Promote to prod by shifting gateway integration to the prod function alias/version intended for release.
Safe rollout: canary and gradual deployment (conceptual)
Because serverless functions can be versioned and gateways can route traffic, you can reduce risk by gradually shifting traffic to a new function version instead of switching all at once. The goal is to detect regressions (latency, error rate, dependency failures) early while limiting impact.
Two practical rollout models
Canary by traffic split: route a small percentage of requests to the new function version (for example, 1–5%), monitor, then increase.
Gradual by route scope: enable the new version only for a subset of endpoints or tenants first, then expand.
Step-by-step: canary rollout using function versions/aliases
1) Publish a new function version and attach it to a “candidate” alias.
2) Configure weighted routing so most traffic stays on the stable alias and a small percentage goes to the candidate.
3) Monitor key signals: p95 latency, 4xx/5xx rates, throttles, and downstream dependency errors for both versions.
4) Increase weight gradually if metrics are healthy; roll back instantly by setting weight back to 0% if not.
Step-by-step: gradual rollout by stage
1) Deploy to staging stage and validate with production-like traffic patterns (synthetic or mirrored where appropriate).
2) Deploy to a production “preview” stage (separate hostname) for internal users or selected clients.
3) Promote to main production stage once validated, keeping the previous version available for quick rollback.