Free Ebook cover Docker for Beginners: Containers Explained with Simple Projects

Docker for Beginners: Containers Explained with Simple Projects

New course

12 pages

Troubleshooting Containers, Builds, and Connectivity

Capítulo 11

Estimated reading time: 13 minutes

+ Exercise

What “Troubleshooting” Means in Docker

Troubleshooting Docker usually falls into three buckets: (1) containers that won’t start or keep crashing, (2) images that won’t build or behave differently than expected, and (3) connectivity problems between your machine, containers, and other services. The goal is to reduce guesswork by collecting evidence (logs, exit codes, configuration, network state) and changing one variable at a time.

A practical mindset: always identify the failing layer first. Is Docker Engine running? Did the image build correctly? Did the container start? Is the process inside the container healthy? Is the port published? Is DNS resolving? Is a firewall blocking traffic? Each layer has a small set of commands that reveal the truth.

A Minimal Troubleshooting Toolkit

Core commands you’ll use repeatedly

  • List containers and status: docker ps and docker ps -a

  • Logs: docker logs <container> (add -f to follow, --tail 200 to limit)

  • Inspect configuration: docker inspect <container_or_image>

    Continue in our app.

    You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.

    Or continue reading below...
    Download App

    Download the app

  • Check exit code: docker inspect -f '{{.State.ExitCode}}' <container>

  • Run a shell inside: docker exec -it <container> sh (or bash if available)

  • Resource usage: docker stats

  • Events timeline: docker events --since 10m

  • System info: docker info

Useful “debug containers”

When you suspect networking or DNS issues, it helps to run a temporary container with network tools. Many minimal images don’t include curl, nslookup, or ping. A common approach is to use a purpose-built image.

docker run --rm -it nicolaka/netshoot sh

Inside that shell you can test DNS, routes, ports, and HTTP requests from the same network namespace a container would use.

Troubleshooting Containers That Won’t Stay Up

Symptom: container exits immediately

If docker ps shows nothing but docker ps -a shows your container with status Exited, the main process ended. Containers are designed to run a single “main” process; when it ends, the container ends.

Step-by-step: find out why it exited

  • 1) Check the status and exit code

    docker ps -a --no-trunc
    docker inspect -f '{{.State.Status}} {{.State.ExitCode}} {{.State.Error}}' mycontainer

    Exit code 0 means the process finished successfully (often not what you intended). Non-zero indicates an error.

  • 2) Read the logs

    docker logs --tail 200 mycontainer

    Look for common patterns: missing config files, invalid flags, permission errors, “address already in use,” or application stack traces.

  • 3) Confirm the command and entrypoint

    docker inspect -f 'Entrypoint={{json .Config.Entrypoint}} Cmd={{json .Config.Cmd}}' mycontainer

    A frequent cause is an incorrect command or a command that runs and exits (for example, starting a server in the foreground vs. running a one-off script).

  • 4) Re-run interactively to reproduce

    docker run --rm -it --entrypoint sh yourimage

    Then manually run the intended command inside the container to see the error directly.

Symptom: container is in a restart loop

If the container shows Restarting, it’s crashing and being restarted by a restart policy (or by an orchestrator). The logs may repeat quickly.

Step-by-step: slow it down and capture evidence

  • 1) Inspect restart policy

    docker inspect -f '{{.HostConfig.RestartPolicy.Name}}' mycontainer
  • 2) Temporarily disable restarts (recreate container)

    Restart policies are set at creation time. Re-run without a restart policy to keep it exited for inspection.

  • 3) Check last logs and exit code

    docker logs --tail 200 mycontainer
    docker inspect -f '{{.State.ExitCode}}' mycontainer
  • 4) Look for dependency timing issues

    Many apps crash because they try to connect to a dependency (database, cache) before it’s ready. The container “works” if you restart later. Evidence: connection refused/timeouts in logs. Fix is typically to add retry logic or a health-check based wait in the application startup, not to rely on arbitrary sleeps.

Symptom: “exec format error”

This usually means an architecture mismatch: you’re trying to run an image built for a different CPU architecture (for example, ARM vs. x86_64), or the entrypoint binary is not compatible.

  • Check image architecture

    docker image inspect yourimage -f '{{.Architecture}}/{{.Os}}'
  • Check host architecture

    docker info --format '{{.Architecture}} {{.OSType}}'

If they don’t match, rebuild for the correct platform or use a multi-arch image. On Apple Silicon, this can show up when pulling older images that only support amd64.

Troubleshooting Build Failures and “It Builds on My Machine”

Symptom: build fails with missing files

Two common causes: the file isn’t in the build context, or it’s excluded by .dockerignore. Remember that Docker can only COPY files that are sent as build context.

Step-by-step: verify build context and ignore rules

  • 1) Confirm you are building from the expected directory

    pwd
    ls -la

    Then run the build from the directory that contains the intended context.

  • 2) Check for .dockerignore patterns

    cat .dockerignore

    A pattern like **/*.env or node_modules can be correct, but if you accidentally ignore a needed file, the build will fail at COPY or the app will fail at runtime.

  • 3) Use plain progress output to see exactly where it fails

    docker build --progress=plain -t yourimage .

Symptom: dependency install fails (apt, apk, pip, npm)

Package installs can fail due to network issues, missing OS packages, wrong base image, or repository metadata problems.

Step-by-step: isolate the failing layer

  • 1) Re-run with no cache (to avoid reusing a broken cached layer)

    docker build --no-cache --progress=plain -t yourimage .
  • 2) If using Debian/Ubuntu, ensure update happens before install

    Repository metadata can be stale if you install without updating. The typical pattern is updating and installing in the same layer so it can’t go out of sync.

  • 3) If the build fails due to DNS or proxy

    Try building with a known-good network and confirm your Docker daemon proxy settings if you’re behind a corporate proxy. Evidence: timeouts, “temporary failure resolving,” TLS handshake errors.

Symptom: build succeeds, but runtime fails due to missing shared libraries

This often happens when you build in one environment and run in a smaller runtime environment that lacks required libraries. Evidence: errors like “error while loading shared libraries” or “No such file or directory” when executing a binary that exists.

Step-by-step: confirm what the binary needs

  • 1) Open a shell in the failing container

    docker run --rm -it yourimage sh
  • 2) Locate the binary and check dependencies

    On many Linux images you can use ldd (if installed) to see required shared libraries. If ldd is missing, install it temporarily in a debug build or use a debug image.

  • 3) Fix by installing required runtime packages

    Install the missing libraries in the runtime image, or choose a base image that includes them. Keep the runtime minimal, but not so minimal that it can’t run your app.

Symptom: “works with cache, fails without cache” (or vice versa)

Cache can hide problems (for example, a dependency server temporarily unavailable) or create problems (for example, stale artifacts). If behavior changes, you need to identify which layer is sensitive.

  • Compare builds

    docker build --progress=plain -t test:cached .
    docker build --no-cache --progress=plain -t test:nocache .
  • Look for non-deterministic steps

    Examples: downloading “latest” artifacts, pulling from unstable URLs, or scripts that depend on current time. Pin versions and use checksums where possible.

Troubleshooting Connectivity: Ports, DNS, and “Connection Refused”

Understand the three common paths

  • Host → Container: you access a containerized service from your laptop/browser using a published port.

  • Container → Host: a container calls a service running on your machine (for example, a local database or mock server).

  • Container → Container: services talk to each other over a Docker network.

Each path fails for different reasons, so identify which one you’re testing before changing anything.

Symptom: you published a port, but the service is unreachable

Typical errors: browser can’t connect, curl times out, or you get “connection refused.”

Step-by-step: verify port publishing and the process binding

  • 1) Confirm the container is running

    docker ps
  • 2) Confirm the port mapping

    docker port mycontainer

    Or inspect:

    docker inspect -f '{{json .NetworkSettings.Ports}}' mycontainer
  • 3) Confirm the service is listening inside the container

    Exec into the container and check listening ports. Some images have ss or netstat.

    docker exec -it mycontainer sh
    ss -lntp || netstat -lntp

    If the service is only listening on 127.0.0.1 inside the container, it will not be reachable via the container’s network interface. Many frameworks default to localhost. Configure it to bind to 0.0.0.0 inside the container.

  • 4) Test from the host with curl

    curl -v http://localhost:YOURPORT/

    If curl says “connection refused,” the port is open but nothing is listening (or the container isn’t running). If it times out, something is blocking (firewall, wrong IP, or the service is stuck).

Symptom: container-to-container name resolution fails

Evidence: errors like “could not resolve host,” “Name or service not known,” or your app tries to connect to localhost for another service.

Step-by-step: validate DNS and target address

  • 1) Confirm both containers are on the same network

    docker inspect -f '{{json .NetworkSettings.Networks}}' serviceA
    docker inspect -f '{{json .NetworkSettings.Networks}}' serviceB
  • 2) From inside one container, resolve the other by name

    Use a debug container attached to the same network (replace mynetwork with your network name):

    docker run --rm -it --network mynetwork nicolaka/netshoot sh
    nslookup serviceB
    curl -v http://serviceB:PORT/
  • 3) Fix “localhost” confusion

    Inside a container, localhost refers to that container itself, not another container and not your host machine. If your app is configured to call http://localhost:5432 expecting a database in another container, it will fail. Use the other container’s DNS name on the Docker network instead.

Symptom: container cannot reach the internet

Evidence: package installs fail at runtime, curl to external sites times out, DNS resolution fails.

Step-by-step: distinguish DNS vs routing vs firewall

  • 1) Test DNS resolution

    docker run --rm -it nicolaka/netshoot sh
    nslookup example.com
  • 2) Test raw connectivity

    curl -I https://example.com
  • 3) Check Docker daemon DNS configuration

    If DNS fails consistently, you may need to configure DNS servers for Docker (commonly in Docker Desktop settings or daemon configuration). Evidence: nslookup fails but IP-based curl works.

  • 4) Consider corporate proxies

    If you’re behind a proxy, containers and builds may need proxy environment variables. Evidence: TLS handshake failures, 407 proxy auth required, or only certain domains failing.

Symptom: container cannot reach a service on the host machine

Developers often run a service locally and want containers to call it. The correct host address depends on your OS and Docker setup.

  • On Docker Desktop (Mac/Windows): host.docker.internal usually resolves to the host.

  • On Linux: you may need to use the host’s IP on the Docker bridge, or configure special host-gateway mapping depending on your environment.

Step-by-step test from a container:

docker run --rm -it nicolaka/netshoot sh
curl -v http://host.docker.internal:YOURPORT/

If it fails, confirm the host service is listening on a non-local interface (not only 127.0.0.1) and that your firewall allows connections from Docker.

Diagnosing “It’s Running, But It’s Not Working”

Check application health from inside the container

A container can be “Up” while the app inside is misconfigured or stuck. Always test from inside the same network namespace.

  • 1) Exec in and check environment

    docker exec -it mycontainer sh
    env | sort
  • 2) Check filesystem paths and permissions

    ls -la
    id

    Permission errors are common when running as a non-root user or when writing to directories that aren’t writable.

  • 3) Confirm configuration files exist

    ls -la /app
    cat /app/config.json

Common misconfigurations that look like “bugs”

  • Wrong environment variables: the app points to the wrong host, port, or credentials.

  • Wrong working directory: relative paths break if the app expects a different WORKDIR.

  • Time zone/locale assumptions: logs and parsing can behave differently.

  • File permissions: the app can read but not write, especially with mounted directories.

Resource and Stability Problems

Symptom: container is killed (OOMKilled) or randomly stops under load

If the container runs fine at first but dies under load, it may be out of memory. Docker (or the OS) can kill the process.

Step-by-step: confirm OOM and identify memory pressure

  • 1) Inspect state for OOMKilled

    docker inspect -f 'OOMKilled={{.State.OOMKilled}} ExitCode={{.State.ExitCode}}' mycontainer
  • 2) Watch resource usage

    docker stats
  • 3) Check application logs for memory spikes

    Many runtimes log GC pressure or memory allocation failures.

Fixes depend on the app: reduce memory usage, increase limits, or adjust runtime settings (for example, Node.js memory flags, JVM heap size). The key is to confirm it’s OOM before changing unrelated settings.

Symptom: CPU is pegged, container is slow

High CPU can be an infinite loop, excessive logging, or too much work per request. Start by verifying it’s the container, not the host.

  • 1) Identify the hot container: docker stats

  • 2) Identify the hot process: exec in and use top or ps if available.

  • 3) Reduce log volume: extremely chatty logs can slow containers and fill disks.

Log and Disk Issues

Symptom: disk fills up, Docker becomes unstable

Docker can consume disk through unused images, stopped containers, build cache, and logs. When disk is nearly full, builds fail and containers may behave unpredictably.

Step-by-step: measure and clean safely

  • 1) See overall usage

    docker system df
  • 2) Inspect large logs

    On many systems, container logs are stored as JSON files by the default logging driver. If a container logs heavily, the log file can grow quickly.

  • 3) Remove unused resources

    docker image prune
    docker container prune
    docker builder prune
    docker system prune

    Use pruning carefully: it removes unused objects. Prefer targeted prune commands when you know what you want to remove.

Debugging with a Repeatable Checklist

A practical flow you can apply to most issues

  • 1) Identify the failing layer: engine, build, container startup, app runtime, network.

  • 2) Capture evidence: docker ps -a, docker logs, exit code, docker inspect.

  • 3) Reproduce in the simplest way: run interactively, reduce variables, test with curl from inside.

  • 4) Validate assumptions: correct port mapping, correct bind address (0.0.0.0), correct DNS name, correct environment variables.

  • 5) Change one thing: rebuild or rerun, then re-test.

Mini-lab: diagnose a port publishing problem

This short exercise trains the most common connectivity failure: the app binds to localhost inside the container.

  • 1) Run a container that starts a web server bound to localhost (example uses Python):

    docker run --rm -d --name bindtest -p 8080:8000 python:3.12-slim sh -c "python -m http.server 8000 --bind 127.0.0.1"
  • 2) Test from host

    curl -v http://localhost:8080/

    You will likely see connection issues because the server is not listening on the container’s external interface.

  • 3) Confirm inside container

    docker exec -it bindtest sh -c "ss -lntp || netstat -lntp"

    Notice it is bound to 127.0.0.1:8000.

  • 4) Fix by binding to 0.0.0.0

    docker rm -f bindtest
    docker run --rm -d --name bindtest -p 8080:8000 python:3.12-slim sh -c "python -m http.server 8000 --bind 0.0.0.0"
    curl -v http://localhost:8080/

Mini-lab: diagnose DNS/service-name issues between containers

This exercise focuses on the “localhost confusion” and name resolution.

  • 1) Start a simple HTTP service container

    docker run -d --name whoami --rm -p 8090:80 traefik/whoami
  • 2) From another container, try to reach it via localhost (this should fail)

    docker run --rm -it nicolaka/netshoot sh -c "curl -v http://localhost:80"

    This fails because localhost is the netshoot container itself.

  • 3) Now attach both to the same user-defined network and use the container name

    docker network create debugnet
    docker run -d --name whoami2 --network debugnet --rm traefik/whoami
    docker run --rm -it --network debugnet nicolaka/netshoot sh -c "curl -v http://whoami2"

Now answer the exercise about the content:

You published a container port to the host, but curl to localhost returns connection refused. Which check most directly confirms whether the application is reachable from outside the container?

You are right! Congratulations, now go to the next page

You missed! Try again.

If a service listens only on 127.0.0.1 inside the container, it will not be reachable via the published port. Checking listening addresses with ss or netstat confirms whether it binds to 0.0.0.0.

Next chapter

Core Docker Glossary and Reference Checklist

Arrow Right Icon
Download the app to earn free Certification and listen to the courses in the background, even with the screen off.