All courses > Technology and Programming > Developer and IT Tools ::

Troubleshooting Containers, Builds, and Connectivity

Capítulo 11

Estimated reading time: 13 minutes

What “Troubleshooting” Means in Docker

Troubleshooting Docker usually falls into three buckets: (1) containers that won’t start or keep crashing, (2) images that won’t build or behave differently than expected, and (3) connectivity problems between your machine, containers, and other services. The goal is to reduce guesswork by collecting evidence (logs, exit codes, configuration, network state) and changing one variable at a time.

A practical mindset: always identify the failing layer first. Is Docker Engine running? Did the image build correctly? Did the container start? Is the process inside the container healthy? Is the port published? Is DNS resolving? Is a firewall blocking traffic? Each layer has a small set of commands that reveal the truth.

A Minimal Troubleshooting Toolkit

Core commands you’ll use repeatedly

List containers and status: docker ps and docker ps -a
Logs: docker logs <container> (add -f to follow, --tail 200 to limit)
Inspect configuration: docker inspect <container_or_image>
Continue in our app.
- Listen to the audio with the screen off.
- Earn a certificate upon completion.
- Over 5000 courses for you to explore!
Or continue reading below...
Download the app
Check exit code: docker inspect -f '{{.State.ExitCode}}' <container>
Run a shell inside: docker exec -it <container> sh (or bash if available)
Resource usage: docker stats
Events timeline: docker events --since 10m
System info: docker info

Useful “debug containers”

When you suspect networking or DNS issues, it helps to run a temporary container with network tools. Many minimal images don’t include curl, nslookup, or ping. A common approach is to use a purpose-built image.

docker run --rm -it nicolaka/netshoot sh

Inside that shell you can test DNS, routes, ports, and HTTP requests from the same network namespace a container would use.

Troubleshooting Containers That Won’t Stay Up

Symptom: container exits immediately

If docker ps shows nothing but docker ps -a shows your container with status Exited, the main process ended. Containers are designed to run a single “main” process; when it ends, the container ends.

Step-by-step: find out why it exited

1) Check the status and exit code
```
docker ps -a --no-trunc
```
```
docker inspect -f '{{.State.Status}} {{.State.ExitCode}} {{.State.Error}}' mycontainer
```
Exit code 0 means the process finished successfully (often not what you intended). Non-zero indicates an error.
2) Read the logs
```
docker logs --tail 200 mycontainer
```
Look for common patterns: missing config files, invalid flags, permission errors, “address already in use,” or application stack traces.
3) Confirm the command and entrypoint
```
docker inspect -f 'Entrypoint={{json .Config.Entrypoint}} Cmd={{json .Config.Cmd}}' mycontainer
```
A frequent cause is an incorrect command or a command that runs and exits (for example, starting a server in the foreground vs. running a one-off script).
4) Re-run interactively to reproduce
```
docker run --rm -it --entrypoint sh yourimage
```
Then manually run the intended command inside the container to see the error directly.

Symptom: container is in a restart loop

If the container shows Restarting, it’s crashing and being restarted by a restart policy (or by an orchestrator). The logs may repeat quickly.

Step-by-step: slow it down and capture evidence

1) Inspect restart policy

docker inspect -f '{{.HostConfig.RestartPolicy.Name}}' mycontainer

2) Temporarily disable restarts (recreate container)
Restart policies are set at creation time. Re-run without a restart policy to keep it exited for inspection.

3) Check last logs and exit code

docker logs --tail 200 mycontainer

docker inspect -f '{{.State.ExitCode}}' mycontainer

4) Look for dependency timing issues
Many apps crash because they try to connect to a dependency (database, cache) before it’s ready. The container “works” if you restart later. Evidence: connection refused/timeouts in logs. Fix is typically to add retry logic or a health-check based wait in the application startup, not to rely on arbitrary sleeps.

Symptom: “exec format error”

This usually means an architecture mismatch: you’re trying to run an image built for a different CPU architecture (for example, ARM vs. x86_64), or the entrypoint binary is not compatible.

Check image architecture

docker image inspect yourimage -f '{{.Architecture}}/{{.Os}}'

Check host architecture

docker info --format '{{.Architecture}} {{.OSType}}'

If they don’t match, rebuild for the correct platform or use a multi-arch image. On Apple Silicon, this can show up when pulling older images that only support amd64.

Troubleshooting Build Failures and “It Builds on My Machine”

Symptom: build fails with missing files

Two common causes: the file isn’t in the build context, or it’s excluded by .dockerignore. Remember that Docker can only COPY files that are sent as build context.

Step-by-step: verify build context and ignore rules

1) Confirm you are building from the expected directory
```
pwd
```
```
ls -la
```
Then run the build from the directory that contains the intended context.
2) Check for .dockerignore patterns
```
cat .dockerignore
```
A pattern like **/*.env or node_modules can be correct, but if you accidentally ignore a needed file, the build will fail at COPY or the app will fail at runtime.
3) Use plain progress output to see exactly where it fails
```
docker build --progress=plain -t yourimage .
```

Symptom: dependency install fails (apt, apk, pip, npm)

Package installs can fail due to network issues, missing OS packages, wrong base image, or repository metadata problems.

Step-by-step: isolate the failing layer

1) Re-run with no cache (to avoid reusing a broken cached layer)
```
docker build --no-cache --progress=plain -t yourimage .
```
2) If using Debian/Ubuntu, ensure update happens before install
Repository metadata can be stale if you install without updating. The typical pattern is updating and installing in the same layer so it can’t go out of sync.
3) If the build fails due to DNS or proxy
Try building with a known-good network and confirm your Docker daemon proxy settings if you’re behind a corporate proxy. Evidence: timeouts, “temporary failure resolving,” TLS handshake errors.

Symptom: build succeeds, but runtime fails due to missing shared libraries

This often happens when you build in one environment and run in a smaller runtime environment that lacks required libraries. Evidence: errors like “error while loading shared libraries” or “No such file or directory” when executing a binary that exists.

Step-by-step: confirm what the binary needs

1) Open a shell in the failing container
```
docker run --rm -it yourimage sh
```
2) Locate the binary and check dependencies
On many Linux images you can use ldd (if installed) to see required shared libraries. If ldd is missing, install it temporarily in a debug build or use a debug image.
3) Fix by installing required runtime packages
Install the missing libraries in the runtime image, or choose a base image that includes them. Keep the runtime minimal, but not so minimal that it can’t run your app.

Symptom: “works with cache, fails without cache” (or vice versa)

Cache can hide problems (for example, a dependency server temporarily unavailable) or create problems (for example, stale artifacts). If behavior changes, you need to identify which layer is sensitive.

Compare builds

docker build --progress=plain -t test:cached .

docker build --no-cache --progress=plain -t test:nocache .

Look for non-deterministic steps
Examples: downloading “latest” artifacts, pulling from unstable URLs, or scripts that depend on current time. Pin versions and use checksums where possible.

Troubleshooting Connectivity: Ports, DNS, and “Connection Refused”

Understand the three common paths

Host → Container: you access a containerized service from your laptop/browser using a published port.
Container → Host: a container calls a service running on your machine (for example, a local database or mock server).
Container → Container: services talk to each other over a Docker network.

Each path fails for different reasons, so identify which one you’re testing before changing anything.

Symptom: you published a port, but the service is unreachable

Typical errors: browser can’t connect, curl times out, or you get “connection refused.”

Step-by-step: verify port publishing and the process binding

1) Confirm the container is running
```
docker ps
```

2) Confirm the port mapping

docker port mycontainer

Or inspect:

docker inspect -f '{{json .NetworkSettings.Ports}}' mycontainer

3) Confirm the service is listening inside the container
Exec into the container and check listening ports. Some images have ss or netstat.
```
docker exec -it mycontainer sh
```
```
ss -lntp || netstat -lntp
```
If the service is only listening on 127.0.0.1 inside the container, it will not be reachable via the container’s network interface. Many frameworks default to localhost. Configure it to bind to 0.0.0.0 inside the container.
4) Test from the host with curl
```
curl -v http://localhost:YOURPORT/
```
If curl says “connection refused,” the port is open but nothing is listening (or the container isn’t running). If it times out, something is blocking (firewall, wrong IP, or the service is stuck).

Symptom: container-to-container name resolution fails

Evidence: errors like “could not resolve host,” “Name or service not known,” or your app tries to connect to localhost for another service.

Step-by-step: validate DNS and target address

1) Confirm both containers are on the same network

docker inspect -f '{{json .NetworkSettings.Networks}}' serviceA

docker inspect -f '{{json .NetworkSettings.Networks}}' serviceB

2) From inside one container, resolve the other by name
Use a debug container attached to the same network (replace mynetwork with your network name):
```
docker run --rm -it --network mynetwork nicolaka/netshoot sh
```
```
nslookup serviceB
```
```
curl -v http://serviceB:PORT/
```
3) Fix “localhost” confusion
Inside a container, localhost refers to that container itself, not another container and not your host machine. If your app is configured to call http://localhost:5432 expecting a database in another container, it will fail. Use the other container’s DNS name on the Docker network instead.

Symptom: container cannot reach the internet

Evidence: package installs fail at runtime, curl to external sites times out, DNS resolution fails.

Step-by-step: distinguish DNS vs routing vs firewall

1) Test DNS resolution

docker run --rm -it nicolaka/netshoot sh

nslookup example.com

2) Test raw connectivity
```
curl -I https://example.com
```
3) Check Docker daemon DNS configuration
If DNS fails consistently, you may need to configure DNS servers for Docker (commonly in Docker Desktop settings or daemon configuration). Evidence: nslookup fails but IP-based curl works.
4) Consider corporate proxies
If you’re behind a proxy, containers and builds may need proxy environment variables. Evidence: TLS handshake failures, 407 proxy auth required, or only certain domains failing.

Symptom: container cannot reach a service on the host machine

Developers often run a service locally and want containers to call it. The correct host address depends on your OS and Docker setup.

On Docker Desktop (Mac/Windows): host.docker.internal usually resolves to the host.
On Linux: you may need to use the host’s IP on the Docker bridge, or configure special host-gateway mapping depending on your environment.

Step-by-step test from a container:

docker run --rm -it nicolaka/netshoot sh

curl -v http://host.docker.internal:YOURPORT/

If it fails, confirm the host service is listening on a non-local interface (not only 127.0.0.1) and that your firewall allows connections from Docker.

Diagnosing “It’s Running, But It’s Not Working”

Check application health from inside the container

A container can be “Up” while the app inside is misconfigured or stuck. Always test from inside the same network namespace.

1) Exec in and check environment

docker exec -it mycontainer sh

env | sort

2) Check filesystem paths and permissions
```
ls -la
```
```
id
```
Permission errors are common when running as a non-root user or when writing to directories that aren’t writable.
3) Confirm configuration files exist
```
ls -la /app
```
```
cat /app/config.json
```

Common misconfigurations that look like “bugs”

Wrong environment variables: the app points to the wrong host, port, or credentials.
Wrong working directory: relative paths break if the app expects a different WORKDIR.
Time zone/locale assumptions: logs and parsing can behave differently.
File permissions: the app can read but not write, especially with mounted directories.

Resource and Stability Problems

Symptom: container is killed (OOMKilled) or randomly stops under load

If the container runs fine at first but dies under load, it may be out of memory. Docker (or the OS) can kill the process.

Step-by-step: confirm OOM and identify memory pressure

1) Inspect state for OOMKilled

docker inspect -f 'OOMKilled={{.State.OOMKilled}} ExitCode={{.State.ExitCode}}' mycontainer

2) Watch resource usage
```
docker stats
```
3) Check application logs for memory spikes
Many runtimes log GC pressure or memory allocation failures.

Fixes depend on the app: reduce memory usage, increase limits, or adjust runtime settings (for example, Node.js memory flags, JVM heap size). The key is to confirm it’s OOM before changing unrelated settings.

Symptom: CPU is pegged, container is slow

High CPU can be an infinite loop, excessive logging, or too much work per request. Start by verifying it’s the container, not the host.

1) Identify the hot container: docker stats
2) Identify the hot process: exec in and use top or ps if available.
3) Reduce log volume: extremely chatty logs can slow containers and fill disks.

Log and Disk Issues

Symptom: disk fills up, Docker becomes unstable

Docker can consume disk through unused images, stopped containers, build cache, and logs. When disk is nearly full, builds fail and containers may behave unpredictably.

Step-by-step: measure and clean safely

1) See overall usage
```
docker system df
```
2) Inspect large logs
On many systems, container logs are stored as JSON files by the default logging driver. If a container logs heavily, the log file can grow quickly.
3) Remove unused resources
```
docker image prune
```
```
docker container prune
```
```
docker builder prune
```
```
docker system prune
```
Use pruning carefully: it removes unused objects. Prefer targeted prune commands when you know what you want to remove.

Debugging with a Repeatable Checklist

A practical flow you can apply to most issues

1) Identify the failing layer: engine, build, container startup, app runtime, network.
2) Capture evidence: docker ps -a, docker logs, exit code, docker inspect.
3) Reproduce in the simplest way: run interactively, reduce variables, test with curl from inside.
4) Validate assumptions: correct port mapping, correct bind address (0.0.0.0), correct DNS name, correct environment variables.
5) Change one thing: rebuild or rerun, then re-test.

Mini-lab: diagnose a port publishing problem

This short exercise trains the most common connectivity failure: the app binds to localhost inside the container.

1) Run a container that starts a web server bound to localhost (example uses Python):

docker run --rm -d --name bindtest -p 8080:8000 python:3.12-slim sh -c "python -m http.server 8000 --bind 127.0.0.1"

2) Test from host
```
curl -v http://localhost:8080/
```
You will likely see connection issues because the server is not listening on the container’s external interface.
3) Confirm inside container
```
docker exec -it bindtest sh -c "ss -lntp || netstat -lntp"
```
Notice it is bound to 127.0.0.1:8000.

4) Fix by binding to 0.0.0.0

docker rm -f bindtest

docker run --rm -d --name bindtest -p 8080:8000 python:3.12-slim sh -c "python -m http.server 8000 --bind 0.0.0.0"

curl -v http://localhost:8080/

Mini-lab: diagnose DNS/service-name issues between containers

This exercise focuses on the “localhost confusion” and name resolution.

1) Start a simple HTTP service container

docker run -d --name whoami --rm -p 8090:80 traefik/whoami

2) From another container, try to reach it via localhost (this should fail)
```
docker run --rm -it nicolaka/netshoot sh -c "curl -v http://localhost:80"
```
This fails because localhost is the netshoot container itself.

3) Now attach both to the same user-defined network and use the container name

docker network create debugnet

docker run -d --name whoami2 --network debugnet --rm traefik/whoami

docker run --rm -it --network debugnet nicolaka/netshoot sh -c "curl -v http://whoami2"

Now answer the exercise about the content:

You published a container port to the host, but curl to localhost returns connection refused. Which check most directly confirms whether the application is reachable from outside the container?

You are right! Congratulations, now go to the next page

You missed! Try again.

If a service listens only on 127.0.0.1 inside the container, it will not be reachable via the published port. Checking listening addresses with ss or netstat confirms whether it binds to 0.0.0.0.