Free Ebook cover Cloud-Native Web Serving with Kubernetes Ingress and Service Mesh

Cloud-Native Web Serving with Kubernetes Ingress and Service Mesh

New course

20 pages

Mesh Traffic Policies: Rate Limiting, Access Control, and Policy Enforcement

Capítulo 15

Estimated reading time: 0 minutes

+ Exercise

What “Traffic Policies” Mean in a Service Mesh

In a service mesh, “traffic policies” are declarative rules that shape how requests are allowed, limited, and evaluated as they pass between services. Unlike application code checks (which vary by language and can be inconsistently implemented), mesh policies are typically enforced at the data plane (sidecar proxies or ambient dataplane) and configured via control plane resources. This makes policy behavior consistent across services and easier to audit. In this chapter, we focus on three policy families that are commonly needed in production: rate limiting (protecting services from overload and abuse), access control (who can call what), and policy enforcement (ensuring requests meet requirements such as headers, methods, paths, or external authorization decisions).

Because previous chapters already covered core mesh concepts and mTLS identity, we will treat identity as an available attribute (for example, a workload principal or service account) and focus on how to use that attribute to drive authorization and enforcement decisions.

Rate Limiting in the Mesh

Why rate limiting belongs in the mesh

Rate limiting controls how many requests are allowed over time. In microservices, rate limiting is often needed at multiple layers: at the edge (north-south) to protect public APIs, and internally (east-west) to prevent one service from overwhelming another. Implementing rate limiting in every service leads to duplicated logic and uneven behavior. Mesh-based rate limiting centralizes enforcement and makes it consistent, with the added benefit that limits can be applied based on request attributes (path, method, headers), caller identity, destination service, or even dynamic metadata.

Common rate limiting models

Most mesh implementations support one or more of these models: (1) local rate limiting, where each proxy enforces limits independently (fast, simple, but not globally consistent across replicas); (2) global/distributed rate limiting, where proxies consult a shared rate limit service (consistent across replicas, but adds a network hop); and (3) hierarchical limits, where you apply a broad limit (per service) plus a narrower limit (per user or per route). Understanding which model you need is crucial: local limits are great for protecting individual pods, while global limits are better for enforcing “tenant X gets 100 requests per second total” across a fleet.

Step-by-step: Local rate limiting with Istio (Envoy)

This example shows local rate limiting at the sidecar, which is often used as a first line of defense. The exact API surface can vary by Istio version, but the underlying idea is the same: attach an Envoy local rate limit filter to inbound traffic for a workload, and define a token bucket.

Continue in our app.

You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.

Or continue reading below...
Download App

Download the app

Step 1: Identify the target workload and traffic direction. Decide whether you want to limit inbound requests to a service (protecting it) or outbound requests from a caller (preventing it from flooding dependencies). In Istio, inbound protection is commonly done with an EnvoyFilter applied to the destination workload.

Step 2: Apply an EnvoyFilter with a token bucket. The following snippet illustrates the intent: limit inbound HTTP requests to 50 requests per second with a burst of 100. Treat it as a template; you should validate against your Istio/Envoy versions and adjust filter names accordingly.

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: local-ratelimit-inbound
  namespace: default
spec:
  workloadSelector:
    labels:
      app: payments
  configPatches:
  - applyTo: HTTP_FILTER
    match:
      context: SIDECAR_INBOUND
      listener:
        filterChain:
          filter:
            name: envoy.filters.network.http_connection_manager
            subFilter:
              name: envoy.filters.http.router
    patch:
      operation: INSERT_BEFORE
      value:
        name: envoy.filters.http.local_ratelimit
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
          stat_prefix: http_local_rate_limiter
          token_bucket:
            max_tokens: 100
            tokens_per_fill: 50
            fill_interval: 1s
          filter_enabled:
            runtime_key: local_rate_limit_enabled
            default_value:
              numerator: 100
              denominator: HUNDRED
          filter_enforced:
            runtime_key: local_rate_limit_enforced
            default_value:
              numerator: 100
              denominator: HUNDRED
          response_headers_to_add:
          - append: false
            header:
              key: x-local-rate-limit
              value: "true"

Step 3: Decide what happens when the limit is exceeded. Local rate limiting typically returns HTTP 429. Ensure clients handle 429 appropriately (for example, by backing off). Even if you do not implement retries (covered elsewhere), you should at least ensure callers do not treat 429 as a fatal error that triggers cascading failures.

Step 4: Observe and tune. Use proxy metrics (for example, Envoy stats) to track how often requests are being rate limited. Tune burst size and refill rate to match real traffic patterns. A common mistake is setting a low burst, which penalizes normal traffic spikes like page loads or batch jobs.

Step-by-step: Global rate limiting with an external rate limit service

Global rate limiting is useful when you need a shared quota across many replicas. Envoy supports a rate limit service (RLS) pattern: the proxy sends a descriptor (a set of key/value pairs describing the request) to the RLS, which decides whether to allow the request. In mesh terms, you configure (1) a rate limit service deployment, (2) an Envoy filter that calls it, and (3) descriptor rules that map request attributes to rate-limit keys.

Step 1: Deploy a rate limit service. Many teams use Envoy’s reference ratelimit service or a vendor-provided one. The service needs a backing store (often Redis) to track counters. Ensure it is highly available because it becomes part of the request path.

Step 2: Define descriptors that match your policy. For example, you might want to limit per API key and per route. A descriptor could look like: api_key=abc123 and route=/v1/checkout. The RLS uses these to maintain counters.

Step 3: Configure the proxy to call the RLS. In Istio, this is commonly done via EnvoyFilter plus a cluster pointing to the RLS. In Linkerd, global rate limiting is typically handled at the edge via ingress controllers or external policy components; the approach differs, but the concept of a shared quota service remains.

Step 4: Start with coarse limits, then refine. Begin with a per-service or per-route limit to protect critical dependencies, then add per-tenant or per-identity limits once you have stable descriptors and observability. Overly granular descriptors can explode cardinality and overload the RLS.

Design tips for rate limiting

  • Prefer limiting by caller identity or tenant key when you need fairness; limiting only by destination can punish all callers equally when one misbehaves.
  • Use separate limits for read vs write paths (for example, GET vs POST) because writes often cost more.
  • Plan for failure modes: if the global RLS is unavailable, decide whether to fail open (allow traffic) or fail closed (deny traffic). For internal services, fail open is often safer for availability; for sensitive endpoints, fail closed may be required.
  • Keep limits close to the resource you protect: inbound limits protect the service; outbound limits protect dependencies and can prevent noisy-neighbor behavior.

Access Control (Authorization) in the Mesh

What mesh access control enforces

Access control answers: “Is this caller allowed to perform this action on this service?” In a mesh, authorization is typically evaluated using attributes such as source identity (principal), source namespace, destination service, request path, method, and sometimes headers. The key advantage is that you can enforce consistent rules without modifying application code, and you can apply rules uniformly across many workloads.

Authorization patterns you will use frequently

  • Default deny with explicit allow: deny everything unless a rule allows it. This is the safest baseline for sensitive namespaces.
  • Namespace boundary rules: allow calls only from specific namespaces (for example, only “frontend” can call “api”).
  • Service-to-service allowlists: allow only specific source principals to call a destination service.
  • Method and path scoping: allow GET /healthz broadly, but restrict POST /admin to a small set of callers.

Step-by-step: Istio AuthorizationPolicy for service-to-service allowlists

Istio’s AuthorizationPolicy lets you define ALLOW and DENY rules. The following example allows only the orders service account in the apps namespace to call the payments service on specific paths, while denying everything else by omission (when used with a default-deny posture).

Step 1: Decide your scope. Apply the policy to a workload (via selector) or to an entire namespace. Workload-level policies are safer when you are starting out because they reduce blast radius.

Step 2: Create an ALLOW policy for the destination workload.

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: payments-allow-orders
  namespace: apps
spec:
  selector:
    matchLabels:
      app: payments
  action: ALLOW
  rules:
  - from:
    - source:
        principals:
        - "cluster.local/ns/apps/sa/orders"
    to:
    - operation:
        methods: ["POST"]
        paths: ["/v1/charge", "/v1/refund"]

Step 3: Add a narrow exception for health checks if needed. Many platforms probe /healthz or /ready. If those probes come from a different identity (for example, a gateway or node agent), add a separate rule that allows GET to those paths from the appropriate source.

Step 4: Establish default deny. In Istio, default behavior is “allow” unless a policy applies. To achieve default deny for a workload, you can create an AuthorizationPolicy with action ALLOW and only the rules you want; anything not matching is denied for that workload once a policy exists. For namespace-wide default deny, you can apply a namespace-level policy and then add explicit allows.

Step 5: Validate with real requests. Use a test pod with curl to call the service from an allowed identity and a disallowed identity. Confirm that disallowed calls receive 403 and that allowed calls succeed. Also verify that metrics and logs show authorization denials, which you will need for troubleshooting.

Step-by-step: Deny rules for high-risk endpoints

Deny rules are useful when you want to block a known-bad pattern even if other allow rules exist. For example, you might allow broad access to a service but explicitly deny access to an internal admin path except from a break-glass identity.

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: payments-deny-admin
  namespace: apps
spec:
  selector:
    matchLabels:
      app: payments
  action: DENY
  rules:
  - to:
    - operation:
        paths: ["/admin", "/admin/*"]
    when:
    - key: request.auth.principal
      notValues: ["cluster.local/ns/platform/sa/breakglass"]

This pattern is especially helpful during migrations: you can keep existing allow rules but still ensure that sensitive paths are protected.

Practical guidance for operating authorization policies

  • Start with observability: before enforcing strict policies, ensure you can see source principals, paths, and methods in telemetry so you can write accurate rules.
  • Prefer workload selectors for incremental rollout: apply policies to one service at a time, then expand.
  • Be explicit about gateways: if traffic enters through an ingress gateway, the “source” seen by the destination may be the gateway identity unless you propagate original identity via headers and validate them with additional mechanisms. Decide whether your policy should trust the gateway as a caller or require end-user auth at the application layer.
  • Document intent: store policies alongside service manifests and include comments in Git (or accompanying docs) describing why a rule exists and who owns it.

Policy Enforcement Beyond Allow/Deny

What “policy enforcement” covers

In practice, teams need more than allow/deny decisions. They need to enforce that requests meet certain requirements (for example, must include a header, must use specific methods, must not exceed a size), and they often need to integrate with external policy engines (for example, centralized authorization, compliance checks, or custom business rules). In a mesh, this is commonly achieved using one or more of: (1) external authorization filters, (2) request validation and normalization at the proxy, and (3) admission-time policy for configuration (ensuring teams cannot deploy insecure routing or bypass controls).

External authorization (ext_authz) for centralized decisions

Envoy supports an external authorization filter where the proxy calls an authorization service before forwarding the request. This is useful when authorization depends on dynamic data (for example, user entitlements, account status, feature flags) or when you want a single policy decision point. The mesh still enforces the decision at the proxy, but the logic lives in a dedicated service.

Step-by-step: Enforcing an external authorization check

Step 1: Define what the auth service will decide. Keep the contract simple: the proxy sends request attributes (method, path, headers, source identity, destination) and the auth service returns allow/deny plus optional headers to inject (for example, user claims).

Step 2: Deploy the authorization service. Run it with strict SLOs and horizontal scaling. Because every request may call it, latency matters. Consider caching decisions for short periods if your policy allows it.

Step 3: Configure the proxy filter. In Istio, this is often done via EnvoyFilter to insert envoy.filters.http.ext_authz on inbound listeners for the protected service or at the gateway for edge enforcement. The following is a conceptual template.

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: payments-ext-authz
  namespace: apps
spec:
  workloadSelector:
    labels:
      app: payments
  configPatches:
  - applyTo: HTTP_FILTER
    match:
      context: SIDECAR_INBOUND
      listener:
        filterChain:
          filter:
            name: envoy.filters.network.http_connection_manager
            subFilter:
              name: envoy.filters.http.router
    patch:
      operation: INSERT_BEFORE
      value:
        name: envoy.filters.http.ext_authz
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.http.ext_authz.v3.ExtAuthz
          transport_api_version: V3
          grpc_service:
            envoy_grpc:
              cluster_name: ext-authz-grpc
            timeout: 0.2s
          failure_mode_allow: false

Step 4: Decide failure behavior. The failure_mode_allow setting determines what happens if the auth service is down. For sensitive operations, you may fail closed. For low-risk internal calls, you may fail open to preserve availability. Make this an explicit decision and test it with chaos experiments.

Step 5: Verify headers and identity propagation. Ensure the auth service receives the attributes it needs. You may need to configure which headers are included in the authorization check and which headers are allowed back to the upstream. Avoid forwarding sensitive headers unless required.

Request constraints and “guardrails” at the proxy

Some enforcement does not require an external service. You can enforce guardrails directly in the proxy configuration: maximum request body size, allowed methods, header normalization, or rejecting malformed requests. While not every mesh exposes all Envoy features through high-level CRDs, you can often implement these via mesh-specific policy resources or via EnvoyFilter as an escape hatch.

Examples of guardrails you might enforce at the mesh layer include: rejecting requests with missing required headers (for example, x-request-id), blocking unexpected HTTP methods (for example, TRACE), or limiting upload sizes to protect memory and downstream services. These controls are especially valuable when multiple teams own callers and you need consistent baseline protections.

Configuration policy enforcement (admission control) for mesh resources

Traffic policy is not only about runtime enforcement; it is also about preventing unsafe configuration from being deployed. Kubernetes admission control can validate or mutate mesh resources (for example, Gateways, VirtualServices, AuthorizationPolicies) to enforce organizational rules. This is a different layer than data-plane enforcement: it stops bad configs before they reach production.

Step-by-step: Enforcing mesh configuration rules with admission policies

Step 1: Choose an admission mechanism. Common options include ValidatingAdmissionPolicy (CEL-based), validating webhooks, or policy engines like OPA Gatekeeper/Kyverno. Pick one that your platform team can operate reliably.

Step 2: Define the rules you want to enforce. Practical examples: require that every namespace with mesh injection has at least one AuthorizationPolicy; forbid wildcard hosts on public gateways; require rate limiting on certain routes; forbid EnvoyFilter usage outside a platform namespace; require that external authorization is enabled for specific services.

Step 3: Implement and test in audit mode first. Start by logging violations without blocking deployments. This helps you discover existing non-compliant resources and refine rules to avoid false positives.

Step 4: Gradually enforce. Turn on blocking for new changes first (grandfather existing resources), then remediate and enforce globally.

Putting It Together: A Practical Policy Set for a “Payments” Service

Scenario and goals

Assume you operate a payments service that should only be called by orders and checkout. You also want to protect it from bursts and enforce an external authorization decision for write operations. The mesh policy set might include: (1) an AuthorizationPolicy allowlist for service-to-service access, (2) a local rate limit to protect each pod, and (3) an ext_authz check for sensitive endpoints like /v1/refund.

Step-by-step assembly order

Step 1: Apply the allowlist AuthorizationPolicy. This ensures only known callers can reach the service at all. Start with the minimal set of paths and methods, then expand as needed.

Step 2: Add local rate limiting on inbound traffic. Choose a conservative limit that protects the service without breaking normal load. Observe 429 rates and adjust burst size first before increasing sustained rate.

Step 3: Add external authorization for the most sensitive routes. Insert ext_authz only where needed (for example, refund endpoints) to reduce latency overhead. If you apply ext_authz globally, ensure the auth service can handle peak QPS.

Step 4: Add admission rules to prevent bypass. For example, block teams from deploying EnvoyFilters that remove ext_authz or disable rate limiting, and require that any new route exposing /admin paths must include a DENY policy by default.

Operational checks you should run

  • Policy correctness tests: automated tests that run in CI to verify that allowed identities can call allowed paths and that disallowed identities are denied.
  • Load tests for rate limiting: confirm that the service remains stable under bursty traffic and that rate limiting triggers as expected.
  • Failure tests for external auth: simulate auth service downtime and confirm that fail-open or fail-closed behavior matches your risk decision.
  • Telemetry review: ensure you can attribute denials and rate-limited responses to specific callers and routes, so teams can self-remediate.

Now answer the exercise about the content:

Which rate limiting model is best when you need a single shared quota enforced across many service replicas?

You are right! Congratulations, now go to the next page

You missed! Try again.

Global/distributed rate limiting uses a shared rate limit service so quotas stay consistent across replicas, unlike local limits that are enforced independently per proxy.

Next chapter

Edge and Internal Routing with Mesh Gateways and Ingress Integration

Arrow Right Icon
Download the app to earn free Certification and listen to the courses in the background, even with the screen off.