Why Mutual TLS Matters for Service-to-Service Traffic
In a microservices environment, most requests never touch the public internet. They move “east-west” between services inside the cluster, often carrying sensitive data such as user identifiers, authorization context, or internal business events. If that traffic is not protected, any compromised workload, misconfigured network policy, or malicious actor with network access can observe or manipulate requests. Mutual TLS, commonly abbreviated as mTLS, addresses this by providing three guarantees for service-to-service calls: encryption in transit (confidentiality), tamper detection (integrity), and strong authentication of both client and server (mutual authentication).
Unlike one-way TLS, where only the server proves its identity to the client, mTLS requires both sides to present certificates. This is particularly valuable in Kubernetes because IP addresses are ephemeral and pods are frequently rescheduled. Instead of trusting “whoever is at this IP,” mTLS lets you trust “whoever can prove possession of a private key corresponding to an approved identity.” That identity-based approach is the foundation for fine-grained authorization policies that remain stable even as workloads move.
Core Concepts: Identity, Certificates, and Trust
Service Identity vs. Network Location
In identity-based security, access decisions are based on a cryptographic identity rather than on network coordinates like IP, port, or node. In Kubernetes, a natural identity anchor is the ServiceAccount. A workload running as ServiceAccount payments in namespace prod can be treated as a distinct principal from payments in namespace staging. When a service mesh issues certificates, it typically encodes this identity into the certificate’s Subject Alternative Name (SAN), enabling policy engines to match on identity strings.
How mTLS Works at a High Level
mTLS is an extension of standard TLS. During the TLS handshake, the client validates the server certificate against a trusted Certificate Authority (CA) and checks that the certificate identity matches the expected server identity. With mTLS, the server also requests a client certificate and validates it. If both validations succeed, the handshake completes and both sides derive session keys used to encrypt application data. The important operational detail is that certificates must be issued, distributed, rotated, and revoked in a way that is automated and safe for large fleets of workloads.
Certificate Authority and Trust Domains
A CA signs workload certificates and is the root of trust for the mesh. A trust domain is the namespace of identities that the CA vouches for. Policies often reference identities in a trust domain, for example: “allow calls from spiffe://cluster.local/ns/prod/sa/orders to payments.” Keeping trust domains explicit helps when you have multiple clusters or multiple meshes and need to decide whether identities from one environment should be trusted in another.
Continue in our app.
You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.
Or continue reading below...Download the app
mTLS in Kubernetes: Where It Gets Enforced
Sidecar Proxies and Transparent Encryption
In many service mesh deployments, each pod runs a sidecar proxy. Application containers send and receive plain HTTP or gRPC locally, while the proxy handles TLS handshakes, certificate presentation, and encryption over the network. This design has two practical advantages: you can enable mTLS without changing application code, and you can enforce consistent security controls across heterogeneous services. The proxy becomes the policy enforcement point for both authentication (mTLS) and authorization (who is allowed to call whom).
Strict vs. Permissive Modes
When rolling out mTLS, you typically choose between permissive and strict behavior. In permissive mode, a service accepts both plaintext and mTLS connections, which helps during migration but can hide insecure traffic if left enabled too long. In strict mode, only mTLS is accepted; plaintext connections fail. A safe rollout often starts with permissive, verifies that all callers can use mTLS, then switches to strict for the target namespace or service.
Identity-Based Authorization: Beyond “Encrypted”
Authentication Is Not Authorization
mTLS tells you who the caller is, but it does not automatically decide whether that caller should be allowed. Identity-based authorization uses the authenticated identity from mTLS to make allow/deny decisions. This is where you implement least privilege: only the specific service accounts that need access can call a given service, and only on the required ports, paths, or methods.
Common Policy Dimensions
Once you have stable identities, you can express policies that are more resilient than IP-based rules. Typical dimensions include: source identity (service account), destination service, HTTP method and path, gRPC service and method, request headers, and sometimes JWT claims when end-user context is propagated. The key is to keep policies understandable and auditable: “orders can call payments /charge” is easier to review than a set of CIDR blocks.
Practical Step-by-Step: Enabling mTLS and Identity-Based Access (Istio Example)
The steps below demonstrate a common workflow using Istio terminology. The same ideas apply to other meshes, but resource names and fields may differ. The goal is: enable mTLS in a namespace, ensure workloads get identities via ServiceAccounts, and then allow only specific callers to reach a service.
Step 1: Prepare Namespaces and ServiceAccounts
Create or verify namespaces and ServiceAccounts so that identities are explicit. Avoid running everything as the default ServiceAccount, because it makes policies coarse and risky.
kubectl create namespace prod || true
kubectl create namespace staging || true
kubectl -n prod create serviceaccount orders || true
kubectl -n prod create serviceaccount payments || true
kubectl -n prod create serviceaccount inventory || trueUpdate your Deployments to run under the intended ServiceAccount. This is what ties the workload to an identity.
apiVersion: apps/v1
kind: Deployment
metadata:
name: payments
namespace: prod
spec:
template:
spec:
serviceAccountName: payments
containers:
- name: app
image: example/payments:1.0Step 2: Enable Sidecar Injection (If Using Sidecars)
If your mesh uses sidecars, ensure the namespace is labeled for injection so that proxies are automatically added to pods. This is mesh-specific; for Istio:
kubectl label namespace prod istio-injection=enabled --overwriteRestart workloads in the namespace so new pods get sidecars.
kubectl -n prod rollout restart deployStep 3: Turn On mTLS in the Namespace (Strict)
Apply a policy that requires mTLS for inbound traffic. In Istio, this is typically done with PeerAuthentication. Start with strict in a controlled namespace where you know all callers are meshed; otherwise use permissive temporarily.
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: prod
spec:
mtls:
mode: STRICTApply it:
kubectl apply -f peerauth-strict.yamlAt this point, any plaintext traffic to workloads in prod will fail. If you have callers outside the mesh (for example, a legacy job or a node-level agent), you must either onboard them to the mesh or carve out a controlled exception (prefer onboarding when possible).
Step 4: Verify That Traffic Is Actually Using mTLS
Verification should be both functional and observable. Functionally, calls should succeed between meshed workloads. Observably, you want proof that the connection is encrypted and authenticated. In Istio, you can inspect proxy configuration and telemetry, but a simple operational check is to use the mesh’s tooling to confirm mTLS mode between services.
# Example: check authentication policy and workload status
kubectl -n prod get peerauthentication
# If you have istioctl available, you can inspect TLS settings
istioctl -n prod authn tls-check deploy/orders deploy/paymentsIf you do not have mesh CLI access, another approach is to look at proxy logs/metrics for TLS handshakes or use a packet capture in a controlled environment to confirm traffic is encrypted on the wire. The key outcome is: the network sees TLS, not plaintext HTTP.
Step 5: Add Authorization Policies Based on Service Identity
With strict mTLS, every caller now has an authenticated identity. Next, restrict access so only the right services can call payments. In Istio, you can use AuthorizationPolicy to allow specific principals. First, deny-by-default for the payments workload, then add explicit allows. A common pattern is: create an allow policy that lists permitted sources; anything not matching is denied.
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: payments-allow-orders
namespace: prod
spec:
selector:
matchLabels:
app: payments
action: ALLOW
rules:
- from:
- source:
principals:
- "cluster.local/ns/prod/sa/orders"
to:
- operation:
methods: ["POST"]
paths: ["/charge"]Apply it:
kubectl apply -f authz-payments-allow-orders.yamlThis policy says: only workloads authenticated as the orders ServiceAccount in prod can call POST /charge on the payments service. If inventory tries the same call, it will be denied even though it has a valid certificate, because authorization is separate from authentication.
Step 6: Handle Non-HTTP Protocols (gRPC, TCP)
Identity-based authorization also works for gRPC and raw TCP, but the policy fields differ. For gRPC, you can match on methods and services (depending on mesh support). For TCP, you often restrict by port and identity only. The practical guidance is: start with identity-only restrictions for non-HTTP traffic, then refine as your mesh capabilities allow.
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: payments-tcp-allow-orders
namespace: prod
spec:
selector:
matchLabels:
app: payments
action: ALLOW
rules:
- from:
- source:
principals:
- "cluster.local/ns/prod/sa/orders"
to:
- operation:
ports: ["9090"]Practical Step-by-Step: Certificate Rotation and Operational Safety
Step 1: Understand Rotation Responsibilities
In a mesh, workload certificates are typically short-lived and rotated automatically by the control plane or an identity agent. Short lifetimes reduce the blast radius of key compromise. Operationally, you need to ensure that rotation does not break connections and that clocks are synchronized across nodes (certificate validity is time-based). If you see intermittent TLS failures, check node time drift and certificate expiration metrics first.
Step 2: Monitor for Expiration and Handshake Errors
Set alerts for certificate expiration windows and TLS handshake error rates at the proxy layer. Handshake errors can indicate expired certs, trust bundle mismatches, or policy misconfiguration (for example, strict mTLS enabled but a caller is not meshed). Monitoring should distinguish between “no route” and “TLS handshake failed,” because the remediation differs.
Step 3: Plan CA Rotation and Trust Bundle Updates
Eventually, you may need to rotate the mesh CA or intermediate CAs. Safe CA rotation usually involves a period where proxies trust both the old and new CA certificates (a trust bundle), while new workload certificates are issued from the new CA. Only after all workloads have rotated should the old CA be removed from trust. The operational takeaway is: CA rotation is a staged process, not a single switch.
Common Pitfalls and How to Avoid Them
Using the Default ServiceAccount Everywhere
If multiple services share the same ServiceAccount, they share the same identity from the policy perspective. That makes it impossible to express least-privilege rules. Assign distinct ServiceAccounts per service (or per security boundary) and ensure Deployments reference them explicitly.
Permissive mTLS Left in Place
Permissive mode is useful for migration, but it can allow plaintext traffic to persist unnoticed. Treat permissive as a temporary state with an explicit deadline. Use observability to find remaining plaintext callers, onboard them, and then switch to strict.
Confusing Service Names with Identities
Kubernetes Services provide stable DNS names, but they are not identities. Identity is tied to the workload’s certificate and typically maps to ServiceAccount. Policies should reference authenticated principals, not service DNS names, when the goal is “only this workload may call.”
Mixing End-User Identity with Service Identity
mTLS authenticates workloads, not end users. If you need end-user authorization, you typically propagate user identity via JWTs or headers and enforce policies that combine workload identity (who is calling) with user claims (on whose behalf). Keep these layers separate: service-to-service trust should not depend solely on user tokens, and user authorization should not be inferred from service identity alone.
Design Patterns for Identity-Based Service Security
Namespace as a Security Boundary
Namespaces often align with teams, environments, or sensitivity levels. Enabling strict mTLS at the namespace level and applying baseline authorization policies provides a strong default posture. Then, add service-specific exceptions as needed. This approach reduces the chance that a newly deployed service is accidentally exposed to unexpected callers.
Allow Lists Over Deny Lists
Prefer explicit allow lists of known callers for sensitive services. Deny lists tend to grow and are easy to bypass when new services appear. With mTLS identities, allow lists remain readable: a small set of principals that are permitted to call specific endpoints.
Separate “Human Access” From “Service Access”
Operational access (for debugging) should not require weakening service-to-service policies. Instead, use controlled entry points such as a dedicated debug service account with narrowly scoped permissions, temporary policies with time limits, or separate tooling paths that do not broaden production service access by default.