What “Troubleshooting” Means in Docker
Troubleshooting Docker usually falls into three buckets: (1) containers that won’t start or keep crashing, (2) images that won’t build or behave differently than expected, and (3) connectivity problems between your machine, containers, and other services. The goal is to reduce guesswork by collecting evidence (logs, exit codes, configuration, network state) and changing one variable at a time.
A practical mindset: always identify the failing layer first. Is Docker Engine running? Did the image build correctly? Did the container start? Is the process inside the container healthy? Is the port published? Is DNS resolving? Is a firewall blocking traffic? Each layer has a small set of commands that reveal the truth.
A Minimal Troubleshooting Toolkit
Core commands you’ll use repeatedly
List containers and status:
docker psanddocker ps -aLogs:
docker logs <container>(add-fto follow,--tail 200to limit)Inspect configuration:
docker inspect <container_or_image>Continue in our app.
You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.
Or continue reading below...Download the app
Check exit code:
docker inspect -f '{{.State.ExitCode}}' <container>Run a shell inside:
docker exec -it <container> sh(orbashif available)Resource usage:
docker statsEvents timeline:
docker events --since 10mSystem info:
docker info
Useful “debug containers”
When you suspect networking or DNS issues, it helps to run a temporary container with network tools. Many minimal images don’t include curl, nslookup, or ping. A common approach is to use a purpose-built image.
docker run --rm -it nicolaka/netshoot shInside that shell you can test DNS, routes, ports, and HTTP requests from the same network namespace a container would use.
Troubleshooting Containers That Won’t Stay Up
Symptom: container exits immediately
If docker ps shows nothing but docker ps -a shows your container with status Exited, the main process ended. Containers are designed to run a single “main” process; when it ends, the container ends.
Step-by-step: find out why it exited
1) Check the status and exit code
docker ps -a --no-truncdocker inspect -f '{{.State.Status}} {{.State.ExitCode}} {{.State.Error}}' mycontainerExit code
0means the process finished successfully (often not what you intended). Non-zero indicates an error.2) Read the logs
docker logs --tail 200 mycontainerLook for common patterns: missing config files, invalid flags, permission errors, “address already in use,” or application stack traces.
3) Confirm the command and entrypoint
docker inspect -f 'Entrypoint={{json .Config.Entrypoint}} Cmd={{json .Config.Cmd}}' mycontainerA frequent cause is an incorrect command or a command that runs and exits (for example, starting a server in the foreground vs. running a one-off script).
4) Re-run interactively to reproduce
docker run --rm -it --entrypoint sh yourimageThen manually run the intended command inside the container to see the error directly.
Symptom: container is in a restart loop
If the container shows Restarting, it’s crashing and being restarted by a restart policy (or by an orchestrator). The logs may repeat quickly.
Step-by-step: slow it down and capture evidence
1) Inspect restart policy
docker inspect -f '{{.HostConfig.RestartPolicy.Name}}' mycontainer2) Temporarily disable restarts (recreate container)
Restart policies are set at creation time. Re-run without a restart policy to keep it exited for inspection.
3) Check last logs and exit code
docker logs --tail 200 mycontainerdocker inspect -f '{{.State.ExitCode}}' mycontainer4) Look for dependency timing issues
Many apps crash because they try to connect to a dependency (database, cache) before it’s ready. The container “works” if you restart later. Evidence: connection refused/timeouts in logs. Fix is typically to add retry logic or a health-check based wait in the application startup, not to rely on arbitrary sleeps.
Symptom: “exec format error”
This usually means an architecture mismatch: you’re trying to run an image built for a different CPU architecture (for example, ARM vs. x86_64), or the entrypoint binary is not compatible.
Check image architecture
docker image inspect yourimage -f '{{.Architecture}}/{{.Os}}'Check host architecture
docker info --format '{{.Architecture}} {{.OSType}}'
If they don’t match, rebuild for the correct platform or use a multi-arch image. On Apple Silicon, this can show up when pulling older images that only support amd64.
Troubleshooting Build Failures and “It Builds on My Machine”
Symptom: build fails with missing files
Two common causes: the file isn’t in the build context, or it’s excluded by .dockerignore. Remember that Docker can only COPY files that are sent as build context.
Step-by-step: verify build context and ignore rules
1) Confirm you are building from the expected directory
pwdls -laThen run the build from the directory that contains the intended context.
2) Check for
.dockerignorepatternscat .dockerignoreA pattern like
**/*.envornode_modulescan be correct, but if you accidentally ignore a needed file, the build will fail atCOPYor the app will fail at runtime.3) Use plain progress output to see exactly where it fails
docker build --progress=plain -t yourimage .
Symptom: dependency install fails (apt, apk, pip, npm)
Package installs can fail due to network issues, missing OS packages, wrong base image, or repository metadata problems.
Step-by-step: isolate the failing layer
1) Re-run with no cache (to avoid reusing a broken cached layer)
docker build --no-cache --progress=plain -t yourimage .2) If using Debian/Ubuntu, ensure update happens before install
Repository metadata can be stale if you install without updating. The typical pattern is updating and installing in the same layer so it can’t go out of sync.
3) If the build fails due to DNS or proxy
Try building with a known-good network and confirm your Docker daemon proxy settings if you’re behind a corporate proxy. Evidence: timeouts, “temporary failure resolving,” TLS handshake errors.
Symptom: build succeeds, but runtime fails due to missing shared libraries
This often happens when you build in one environment and run in a smaller runtime environment that lacks required libraries. Evidence: errors like “error while loading shared libraries” or “No such file or directory” when executing a binary that exists.
Step-by-step: confirm what the binary needs
1) Open a shell in the failing container
docker run --rm -it yourimage sh2) Locate the binary and check dependencies
On many Linux images you can use
ldd(if installed) to see required shared libraries. Iflddis missing, install it temporarily in a debug build or use a debug image.3) Fix by installing required runtime packages
Install the missing libraries in the runtime image, or choose a base image that includes them. Keep the runtime minimal, but not so minimal that it can’t run your app.
Symptom: “works with cache, fails without cache” (or vice versa)
Cache can hide problems (for example, a dependency server temporarily unavailable) or create problems (for example, stale artifacts). If behavior changes, you need to identify which layer is sensitive.
Compare builds
docker build --progress=plain -t test:cached .docker build --no-cache --progress=plain -t test:nocache .Look for non-deterministic steps
Examples: downloading “latest” artifacts, pulling from unstable URLs, or scripts that depend on current time. Pin versions and use checksums where possible.
Troubleshooting Connectivity: Ports, DNS, and “Connection Refused”
Understand the three common paths
Host → Container: you access a containerized service from your laptop/browser using a published port.
Container → Host: a container calls a service running on your machine (for example, a local database or mock server).
Container → Container: services talk to each other over a Docker network.
Each path fails for different reasons, so identify which one you’re testing before changing anything.
Symptom: you published a port, but the service is unreachable
Typical errors: browser can’t connect, curl times out, or you get “connection refused.”
Step-by-step: verify port publishing and the process binding
1) Confirm the container is running
docker ps2) Confirm the port mapping
docker port mycontainerOr inspect:
docker inspect -f '{{json .NetworkSettings.Ports}}' mycontainer3) Confirm the service is listening inside the container
Exec into the container and check listening ports. Some images have
ssornetstat.docker exec -it mycontainer shss -lntp || netstat -lntpIf the service is only listening on
127.0.0.1inside the container, it will not be reachable via the container’s network interface. Many frameworks default to localhost. Configure it to bind to0.0.0.0inside the container.4) Test from the host with curl
curl -v http://localhost:YOURPORT/If curl says “connection refused,” the port is open but nothing is listening (or the container isn’t running). If it times out, something is blocking (firewall, wrong IP, or the service is stuck).
Symptom: container-to-container name resolution fails
Evidence: errors like “could not resolve host,” “Name or service not known,” or your app tries to connect to localhost for another service.
Step-by-step: validate DNS and target address
1) Confirm both containers are on the same network
docker inspect -f '{{json .NetworkSettings.Networks}}' serviceAdocker inspect -f '{{json .NetworkSettings.Networks}}' serviceB2) From inside one container, resolve the other by name
Use a debug container attached to the same network (replace
mynetworkwith your network name):docker run --rm -it --network mynetwork nicolaka/netshoot shnslookup serviceBcurl -v http://serviceB:PORT/3) Fix “localhost” confusion
Inside a container,
localhostrefers to that container itself, not another container and not your host machine. If your app is configured to callhttp://localhost:5432expecting a database in another container, it will fail. Use the other container’s DNS name on the Docker network instead.
Symptom: container cannot reach the internet
Evidence: package installs fail at runtime, curl to external sites times out, DNS resolution fails.
Step-by-step: distinguish DNS vs routing vs firewall
1) Test DNS resolution
docker run --rm -it nicolaka/netshoot shnslookup example.com2) Test raw connectivity
curl -I https://example.com3) Check Docker daemon DNS configuration
If DNS fails consistently, you may need to configure DNS servers for Docker (commonly in Docker Desktop settings or daemon configuration). Evidence:
nslookupfails but IP-based curl works.4) Consider corporate proxies
If you’re behind a proxy, containers and builds may need proxy environment variables. Evidence: TLS handshake failures, 407 proxy auth required, or only certain domains failing.
Symptom: container cannot reach a service on the host machine
Developers often run a service locally and want containers to call it. The correct host address depends on your OS and Docker setup.
On Docker Desktop (Mac/Windows):
host.docker.internalusually resolves to the host.On Linux: you may need to use the host’s IP on the Docker bridge, or configure special host-gateway mapping depending on your environment.
Step-by-step test from a container:
docker run --rm -it nicolaka/netshoot shcurl -v http://host.docker.internal:YOURPORT/If it fails, confirm the host service is listening on a non-local interface (not only 127.0.0.1) and that your firewall allows connections from Docker.
Diagnosing “It’s Running, But It’s Not Working”
Check application health from inside the container
A container can be “Up” while the app inside is misconfigured or stuck. Always test from inside the same network namespace.
1) Exec in and check environment
docker exec -it mycontainer shenv | sort2) Check filesystem paths and permissions
ls -laidPermission errors are common when running as a non-root user or when writing to directories that aren’t writable.
3) Confirm configuration files exist
ls -la /appcat /app/config.json
Common misconfigurations that look like “bugs”
Wrong environment variables: the app points to the wrong host, port, or credentials.
Wrong working directory: relative paths break if the app expects a different
WORKDIR.Time zone/locale assumptions: logs and parsing can behave differently.
File permissions: the app can read but not write, especially with mounted directories.
Resource and Stability Problems
Symptom: container is killed (OOMKilled) or randomly stops under load
If the container runs fine at first but dies under load, it may be out of memory. Docker (or the OS) can kill the process.
Step-by-step: confirm OOM and identify memory pressure
1) Inspect state for OOMKilled
docker inspect -f 'OOMKilled={{.State.OOMKilled}} ExitCode={{.State.ExitCode}}' mycontainer2) Watch resource usage
docker stats3) Check application logs for memory spikes
Many runtimes log GC pressure or memory allocation failures.
Fixes depend on the app: reduce memory usage, increase limits, or adjust runtime settings (for example, Node.js memory flags, JVM heap size). The key is to confirm it’s OOM before changing unrelated settings.
Symptom: CPU is pegged, container is slow
High CPU can be an infinite loop, excessive logging, or too much work per request. Start by verifying it’s the container, not the host.
1) Identify the hot container:
docker stats2) Identify the hot process: exec in and use
toporpsif available.3) Reduce log volume: extremely chatty logs can slow containers and fill disks.
Log and Disk Issues
Symptom: disk fills up, Docker becomes unstable
Docker can consume disk through unused images, stopped containers, build cache, and logs. When disk is nearly full, builds fail and containers may behave unpredictably.
Step-by-step: measure and clean safely
1) See overall usage
docker system df2) Inspect large logs
On many systems, container logs are stored as JSON files by the default logging driver. If a container logs heavily, the log file can grow quickly.
3) Remove unused resources
docker image prunedocker container prunedocker builder prunedocker system pruneUse pruning carefully: it removes unused objects. Prefer targeted prune commands when you know what you want to remove.
Debugging with a Repeatable Checklist
A practical flow you can apply to most issues
1) Identify the failing layer: engine, build, container startup, app runtime, network.
2) Capture evidence:
docker ps -a,docker logs, exit code,docker inspect.3) Reproduce in the simplest way: run interactively, reduce variables, test with curl from inside.
4) Validate assumptions: correct port mapping, correct bind address (
0.0.0.0), correct DNS name, correct environment variables.5) Change one thing: rebuild or rerun, then re-test.
Mini-lab: diagnose a port publishing problem
This short exercise trains the most common connectivity failure: the app binds to localhost inside the container.
1) Run a container that starts a web server bound to localhost (example uses Python):
docker run --rm -d --name bindtest -p 8080:8000 python:3.12-slim sh -c "python -m http.server 8000 --bind 127.0.0.1"2) Test from host
curl -v http://localhost:8080/You will likely see connection issues because the server is not listening on the container’s external interface.
3) Confirm inside container
docker exec -it bindtest sh -c "ss -lntp || netstat -lntp"Notice it is bound to
127.0.0.1:8000.4) Fix by binding to 0.0.0.0
docker rm -f bindtestdocker run --rm -d --name bindtest -p 8080:8000 python:3.12-slim sh -c "python -m http.server 8000 --bind 0.0.0.0"curl -v http://localhost:8080/
Mini-lab: diagnose DNS/service-name issues between containers
This exercise focuses on the “localhost confusion” and name resolution.
1) Start a simple HTTP service container
docker run -d --name whoami --rm -p 8090:80 traefik/whoami2) From another container, try to reach it via localhost (this should fail)
docker run --rm -it nicolaka/netshoot sh -c "curl -v http://localhost:80"This fails because localhost is the netshoot container itself.
3) Now attach both to the same user-defined network and use the container name
docker network create debugnetdocker run -d --name whoami2 --network debugnet --rm traefik/whoamidocker run --rm -it --network debugnet nicolaka/netshoot sh -c "curl -v http://whoami2"