Free Ebook cover Cloud-Native Web Serving with Kubernetes Ingress and Service Mesh

Cloud-Native Web Serving with Kubernetes Ingress and Service Mesh

New course

20 pages

Mutual TLS and Identity-Based Service-to-Service Security

Capítulo 14

Estimated reading time: 0 minutes

+ Exercise

Why Mutual TLS Matters for Service-to-Service Traffic

In a microservices environment, most requests never touch the public internet. They move “east-west” between services inside the cluster, often carrying sensitive data such as user identifiers, authorization context, or internal business events. If that traffic is not protected, any compromised workload, misconfigured network policy, or malicious actor with network access can observe or manipulate requests. Mutual TLS, commonly abbreviated as mTLS, addresses this by providing three guarantees for service-to-service calls: encryption in transit (confidentiality), tamper detection (integrity), and strong authentication of both client and server (mutual authentication).

Unlike one-way TLS, where only the server proves its identity to the client, mTLS requires both sides to present certificates. This is particularly valuable in Kubernetes because IP addresses are ephemeral and pods are frequently rescheduled. Instead of trusting “whoever is at this IP,” mTLS lets you trust “whoever can prove possession of a private key corresponding to an approved identity.” That identity-based approach is the foundation for fine-grained authorization policies that remain stable even as workloads move.

Core Concepts: Identity, Certificates, and Trust

Service Identity vs. Network Location

In identity-based security, access decisions are based on a cryptographic identity rather than on network coordinates like IP, port, or node. In Kubernetes, a natural identity anchor is the ServiceAccount. A workload running as ServiceAccount payments in namespace prod can be treated as a distinct principal from payments in namespace staging. When a service mesh issues certificates, it typically encodes this identity into the certificate’s Subject Alternative Name (SAN), enabling policy engines to match on identity strings.

How mTLS Works at a High Level

mTLS is an extension of standard TLS. During the TLS handshake, the client validates the server certificate against a trusted Certificate Authority (CA) and checks that the certificate identity matches the expected server identity. With mTLS, the server also requests a client certificate and validates it. If both validations succeed, the handshake completes and both sides derive session keys used to encrypt application data. The important operational detail is that certificates must be issued, distributed, rotated, and revoked in a way that is automated and safe for large fleets of workloads.

Certificate Authority and Trust Domains

A CA signs workload certificates and is the root of trust for the mesh. A trust domain is the namespace of identities that the CA vouches for. Policies often reference identities in a trust domain, for example: “allow calls from spiffe://cluster.local/ns/prod/sa/orders to payments.” Keeping trust domains explicit helps when you have multiple clusters or multiple meshes and need to decide whether identities from one environment should be trusted in another.

Continue in our app.

You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.

Or continue reading below...
Download App

Download the app

mTLS in Kubernetes: Where It Gets Enforced

Sidecar Proxies and Transparent Encryption

In many service mesh deployments, each pod runs a sidecar proxy. Application containers send and receive plain HTTP or gRPC locally, while the proxy handles TLS handshakes, certificate presentation, and encryption over the network. This design has two practical advantages: you can enable mTLS without changing application code, and you can enforce consistent security controls across heterogeneous services. The proxy becomes the policy enforcement point for both authentication (mTLS) and authorization (who is allowed to call whom).

Strict vs. Permissive Modes

When rolling out mTLS, you typically choose between permissive and strict behavior. In permissive mode, a service accepts both plaintext and mTLS connections, which helps during migration but can hide insecure traffic if left enabled too long. In strict mode, only mTLS is accepted; plaintext connections fail. A safe rollout often starts with permissive, verifies that all callers can use mTLS, then switches to strict for the target namespace or service.

Identity-Based Authorization: Beyond “Encrypted”

Authentication Is Not Authorization

mTLS tells you who the caller is, but it does not automatically decide whether that caller should be allowed. Identity-based authorization uses the authenticated identity from mTLS to make allow/deny decisions. This is where you implement least privilege: only the specific service accounts that need access can call a given service, and only on the required ports, paths, or methods.

Common Policy Dimensions

Once you have stable identities, you can express policies that are more resilient than IP-based rules. Typical dimensions include: source identity (service account), destination service, HTTP method and path, gRPC service and method, request headers, and sometimes JWT claims when end-user context is propagated. The key is to keep policies understandable and auditable: “orders can call payments /charge” is easier to review than a set of CIDR blocks.

Practical Step-by-Step: Enabling mTLS and Identity-Based Access (Istio Example)

The steps below demonstrate a common workflow using Istio terminology. The same ideas apply to other meshes, but resource names and fields may differ. The goal is: enable mTLS in a namespace, ensure workloads get identities via ServiceAccounts, and then allow only specific callers to reach a service.

Step 1: Prepare Namespaces and ServiceAccounts

Create or verify namespaces and ServiceAccounts so that identities are explicit. Avoid running everything as the default ServiceAccount, because it makes policies coarse and risky.

kubectl create namespace prod || true
kubectl create namespace staging || true

kubectl -n prod create serviceaccount orders || true
kubectl -n prod create serviceaccount payments || true
kubectl -n prod create serviceaccount inventory || true

Update your Deployments to run under the intended ServiceAccount. This is what ties the workload to an identity.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payments
  namespace: prod
spec:
  template:
    spec:
      serviceAccountName: payments
      containers:
      - name: app
        image: example/payments:1.0

Step 2: Enable Sidecar Injection (If Using Sidecars)

If your mesh uses sidecars, ensure the namespace is labeled for injection so that proxies are automatically added to pods. This is mesh-specific; for Istio:

kubectl label namespace prod istio-injection=enabled --overwrite

Restart workloads in the namespace so new pods get sidecars.

kubectl -n prod rollout restart deploy

Step 3: Turn On mTLS in the Namespace (Strict)

Apply a policy that requires mTLS for inbound traffic. In Istio, this is typically done with PeerAuthentication. Start with strict in a controlled namespace where you know all callers are meshed; otherwise use permissive temporarily.

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: prod
spec:
  mtls:
    mode: STRICT

Apply it:

kubectl apply -f peerauth-strict.yaml

At this point, any plaintext traffic to workloads in prod will fail. If you have callers outside the mesh (for example, a legacy job or a node-level agent), you must either onboard them to the mesh or carve out a controlled exception (prefer onboarding when possible).

Step 4: Verify That Traffic Is Actually Using mTLS

Verification should be both functional and observable. Functionally, calls should succeed between meshed workloads. Observably, you want proof that the connection is encrypted and authenticated. In Istio, you can inspect proxy configuration and telemetry, but a simple operational check is to use the mesh’s tooling to confirm mTLS mode between services.

# Example: check authentication policy and workload status
kubectl -n prod get peerauthentication

# If you have istioctl available, you can inspect TLS settings
istioctl -n prod authn tls-check deploy/orders deploy/payments

If you do not have mesh CLI access, another approach is to look at proxy logs/metrics for TLS handshakes or use a packet capture in a controlled environment to confirm traffic is encrypted on the wire. The key outcome is: the network sees TLS, not plaintext HTTP.

Step 5: Add Authorization Policies Based on Service Identity

With strict mTLS, every caller now has an authenticated identity. Next, restrict access so only the right services can call payments. In Istio, you can use AuthorizationPolicy to allow specific principals. First, deny-by-default for the payments workload, then add explicit allows. A common pattern is: create an allow policy that lists permitted sources; anything not matching is denied.

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: payments-allow-orders
  namespace: prod
spec:
  selector:
    matchLabels:
      app: payments
  action: ALLOW
  rules:
  - from:
    - source:
        principals:
        - "cluster.local/ns/prod/sa/orders"
    to:
    - operation:
        methods: ["POST"]
        paths: ["/charge"]

Apply it:

kubectl apply -f authz-payments-allow-orders.yaml

This policy says: only workloads authenticated as the orders ServiceAccount in prod can call POST /charge on the payments service. If inventory tries the same call, it will be denied even though it has a valid certificate, because authorization is separate from authentication.

Step 6: Handle Non-HTTP Protocols (gRPC, TCP)

Identity-based authorization also works for gRPC and raw TCP, but the policy fields differ. For gRPC, you can match on methods and services (depending on mesh support). For TCP, you often restrict by port and identity only. The practical guidance is: start with identity-only restrictions for non-HTTP traffic, then refine as your mesh capabilities allow.

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: payments-tcp-allow-orders
  namespace: prod
spec:
  selector:
    matchLabels:
      app: payments
  action: ALLOW
  rules:
  - from:
    - source:
        principals:
        - "cluster.local/ns/prod/sa/orders"
    to:
    - operation:
        ports: ["9090"]

Practical Step-by-Step: Certificate Rotation and Operational Safety

Step 1: Understand Rotation Responsibilities

In a mesh, workload certificates are typically short-lived and rotated automatically by the control plane or an identity agent. Short lifetimes reduce the blast radius of key compromise. Operationally, you need to ensure that rotation does not break connections and that clocks are synchronized across nodes (certificate validity is time-based). If you see intermittent TLS failures, check node time drift and certificate expiration metrics first.

Step 2: Monitor for Expiration and Handshake Errors

Set alerts for certificate expiration windows and TLS handshake error rates at the proxy layer. Handshake errors can indicate expired certs, trust bundle mismatches, or policy misconfiguration (for example, strict mTLS enabled but a caller is not meshed). Monitoring should distinguish between “no route” and “TLS handshake failed,” because the remediation differs.

Step 3: Plan CA Rotation and Trust Bundle Updates

Eventually, you may need to rotate the mesh CA or intermediate CAs. Safe CA rotation usually involves a period where proxies trust both the old and new CA certificates (a trust bundle), while new workload certificates are issued from the new CA. Only after all workloads have rotated should the old CA be removed from trust. The operational takeaway is: CA rotation is a staged process, not a single switch.

Common Pitfalls and How to Avoid Them

Using the Default ServiceAccount Everywhere

If multiple services share the same ServiceAccount, they share the same identity from the policy perspective. That makes it impossible to express least-privilege rules. Assign distinct ServiceAccounts per service (or per security boundary) and ensure Deployments reference them explicitly.

Permissive mTLS Left in Place

Permissive mode is useful for migration, but it can allow plaintext traffic to persist unnoticed. Treat permissive as a temporary state with an explicit deadline. Use observability to find remaining plaintext callers, onboard them, and then switch to strict.

Confusing Service Names with Identities

Kubernetes Services provide stable DNS names, but they are not identities. Identity is tied to the workload’s certificate and typically maps to ServiceAccount. Policies should reference authenticated principals, not service DNS names, when the goal is “only this workload may call.”

Mixing End-User Identity with Service Identity

mTLS authenticates workloads, not end users. If you need end-user authorization, you typically propagate user identity via JWTs or headers and enforce policies that combine workload identity (who is calling) with user claims (on whose behalf). Keep these layers separate: service-to-service trust should not depend solely on user tokens, and user authorization should not be inferred from service identity alone.

Design Patterns for Identity-Based Service Security

Namespace as a Security Boundary

Namespaces often align with teams, environments, or sensitivity levels. Enabling strict mTLS at the namespace level and applying baseline authorization policies provides a strong default posture. Then, add service-specific exceptions as needed. This approach reduces the chance that a newly deployed service is accidentally exposed to unexpected callers.

Allow Lists Over Deny Lists

Prefer explicit allow lists of known callers for sensitive services. Deny lists tend to grow and are easy to bypass when new services appear. With mTLS identities, allow lists remain readable: a small set of principals that are permitted to call specific endpoints.

Separate “Human Access” From “Service Access”

Operational access (for debugging) should not require weakening service-to-service policies. Instead, use controlled entry points such as a dedicated debug service account with narrowly scoped permissions, temporary policies with time limits, or separate tooling paths that do not broaden production service access by default.

Now answer the exercise about the content:

What is the key advantage of using mTLS with identity-based policies in Kubernetes compared to relying on IP addresses?

You are right! Congratulations, now go to the next page

You missed! Try again.

mTLS authenticates workloads using certificates tied to stable identities (often ServiceAccounts), which is more reliable than IP-based rules in dynamic environments where pods and IPs frequently change.

Next chapter

Mesh Traffic Policies: Rate Limiting, Access Control, and Policy Enforcement

Arrow Right Icon
Download the app to earn free Certification and listen to the courses in the background, even with the screen off.