All courses > Technology and Programming > Cloud Computing and Web Servers ::

Practical Troubleshooting with Ping, Traceroute, and Path Testing

Capítulo 12

Estimated reading time: 10 minutes

Ping: What It Actually Tests (and What It Doesn’t)

ping is an ICMP-based tool that answers two practical questions: (1) can I reach that IP at all (basic reachability), and (2) how long does it take (round-trip latency) and how consistent is it (jitter/packet loss). In hosting and cloud troubleshooting, ping is most useful as a fast “is anything alive on that IP?” check and as a baseline latency measurement between two points.

Key observations to record from ping

Packet loss: 0% loss suggests stable reachability; intermittent loss often points to congestion, rate-limiting, or an overloaded device.
RTT (round-trip time): compare min/avg/max; a large spread (e.g., 10ms avg, 200ms max) indicates jitter.
TTL in replies: not a hop count by itself, but sudden changes can hint that replies are coming from a different device/path than expected.

Common ping commands you’ll actually use

# Basic reachability (Linux/macOS) - stop with Ctrl+C
ping 203.0.113.10

# Send a fixed number of probes
ping -c 5 203.0.113.10

# Faster interval (be careful; can trigger rate limits)
ping -c 20 -i 0.2 203.0.113.10

# Windows
ping 203.0.113.10
ping -n 5 203.0.113.10

Why Ping Can Fail Even When TCP Works

A very common cloud/hosting scenario: a website loads fine over HTTPS, but ping to the same IP times out. This is not a contradiction—ICMP and TCP are different protocols, and many environments treat them differently.

Typical reasons ICMP fails while TCP succeeds

ICMP filtered or rate-limited: firewalls, security groups, or upstream providers may block ICMP echo requests/replies while allowing TCP 80/443.
Host-based firewall rules: the OS firewall may drop ICMP but allow web ports.
DDoS protections: some services intentionally deprioritize or block ICMP to reduce attack surface.
Asymmetric routing: the target receives the ping but the reply takes a broken return path (less common, but important).

Practical takeaway: don’t use ping as “the service is down” proof. Use ping to test reachability/latency, but validate services with targeted port tests.

Traceroute: How It Behaves and How to Interpret Hops/Timeouts

traceroute (Linux/macOS) and tracert (Windows) map the path toward a destination by eliciting “time exceeded” responses from routers along the way. Each line (“hop”) is a router or device that responded, along with timing. When a hop shows * * *, it means “no response to the probe,” not necessarily “traffic cannot pass.” Many routers deprioritize or block these responses.

Traceroute variants matter (ICMP vs UDP vs TCP)

ICMP-based: often easiest to understand, but may be filtered.
UDP-based (classic traceroute): may be blocked by firewalls that drop high UDP ports.
TCP-based: can be most realistic for web troubleshooting because it mimics TCP to a specific port (like 443).

# Linux/macOS: classic traceroute (often UDP by default)
traceroute 203.0.113.10

# Linux: ICMP traceroute
traceroute -I 203.0.113.10

# Linux: TCP traceroute to HTTPS
traceroute -T -p 443 203.0.113.10

# Windows
tracert 203.0.113.10

How to read traceroute output in cloud/hosting incidents

Early hop failure (hop 1–2): often local gateway/VPN issues, local firewall, or your immediate network path.
Mid-path timeouts but later hops respond: usually harmless; an intermediate router isn’t replying to probes, but forwarding traffic.
Traceroute stops and never reaches destination: indicates a routing/forwarding issue, filtering, or a blackhole beyond the last responding hop.
Destination responds with high latency: compare with ping and with tests from another source network to decide if it’s path congestion vs host overload.

Targeted Port Testing: Validate the Service, Not Just the Host

When users report “site down,” you need to answer: “Is the network path broken, or is the service not reachable on the required port?” Targeted port tests check exactly that.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Tools and what they prove

tcping (Windows) / nc (netcat) / telnet: confirms TCP connect success/failure to a port.
curl: confirms application-layer behavior (HTTP status, TLS handshake, redirects, headers).
openssl s_client: confirms TLS handshake details when HTTPS behaves oddly.

# TCP connect test (Linux/macOS) - succeeds if it can connect
nc -vz 203.0.113.10 443
nc -vz 203.0.113.10 80

# Windows tcping (if installed)
tcping 203.0.113.10 443

# HTTP(S) request test with timing
curl -I https://example.com
curl -Iv https://203.0.113.10 --resolve example.com:443:203.0.113.10

# See where time is spent (DNS, connect, TLS, first byte)
curl -o /dev/null -s -w "dns=%{time_namelookup} connect=%{time_connect} tls=%{time_appconnect} ttfb=%{time_starttransfer} total=%{time_total}\n" https://example.com

# TLS handshake debugging
openssl s_client -connect example.com:443 -servername example.com

Interpreting common outcomes

TCP connect succeeds but curl fails: likely application/TLS/virtual host issue (wrong SNI/Host header, backend error, redirect loop).
TCP connect times out: often filtering (security group/firewall) or routing/return path issues.
TCP connect is refused immediately: the host is reachable but nothing is listening on that port (or an active reject rule).

Workflow Discipline: Document and Change One Variable at a Time

Fast troubleshooting comes from controlled experiments. For each test, write down: source host, source IP, destination (name and IP), tool/command, timestamp, and result. Then change only one variable per iteration.

Variables worth changing deliberately

Resolver: system DNS vs a known public resolver vs an internal resolver.
Source host/network: your laptop vs a bastion host vs a monitoring node; office network vs mobile hotspot.
Destination form: test the name and the resolved IP; test IPv4 vs IPv6 if applicable.
Protocol/port: ICMP vs TCP 443 vs TCP 80 vs the actual service port.

Lab 1: Differentiate DNS Failure vs Routing Failure

Goal: determine whether “can’t reach the site” is because the name isn’t resolving correctly or because the network path to the resolved IP is failing.

Step-by-step

1) Capture the symptom precisely

What exactly fails? Browser error, timeout, “server not found,” TLS error?
From where? (your workstation, a server, a customer network)

2) Test DNS resolution (don’t assume)

# Linux/macOS
getent hosts example.com

# If dig is available
# dig +short example.com A
# dig +short example.com AAAA

# Windows
nslookup example.com

If resolution fails: you likely have a DNS problem (wrong resolver, missing record, split-horizon mismatch, or upstream DNS outage).
If resolution succeeds: write down the returned IP(s). If multiple IPs are returned, test each one to avoid chasing a single bad node.

3) Bypass DNS to separate name vs network

# Ping the resolved IP (reachability baseline)
ping -c 3 203.0.113.10

# Trace the path to the IP
traceroute 203.0.113.10

# Test the actual service port
nc -vz 203.0.113.10 443
curl -I https://example.com

4) Change only the resolver (one variable)

Repeat resolution using a different resolver (e.g., your internal resolver vs a public resolver) and compare answers.
If different resolvers return different IPs, document both sets and test reachability to each IP.

5) Decide

DNS failure indicators: name doesn’t resolve, resolves inconsistently, resolves to unexpected IPs, or resolves to private IPs from a public client.
Routing/transport failure indicators: name resolves consistently, but ping/traceroute/port tests to the resolved IP fail from multiple sources.

Lab 2: Isolate Security Group/Firewall Issues vs Server Not Listening

Goal: when a port test fails, determine whether the network is blocking you or the server simply isn’t accepting connections.

Step-by-step

1) Start with a TCP connect test

# Replace with your service port
nc -vz 203.0.113.10 443

Timeout: suggests filtering or a broken path (packets not reaching the service or replies not returning).
Connection refused: suggests the host is reachable but nothing is listening on that port (or an explicit reject).

2) Confirm with an application-layer test

curl -Iv https://example.com

If TCP connects but HTTP fails, capture the exact error (TLS handshake failure, certificate name mismatch, 403/503, etc.).

3) Change only the source network

Repeat the same port test from a different source (e.g., a cloud VM in another VPC/VNet, a bastion host, or a different ISP).
If it works from one source but not another: suspect source IP allowlists, geo/IP reputation blocks, or network ACL differences.

4) Test a known-open port on the same host

If you expect 22 (SSH) to be open for admins, test it from the same source.

nc -vz 203.0.113.10 22

If 22 works but 443 times out: likely port-specific filtering (security group/firewall/NACL) or the service isn’t bound/listening.
If both time out: broader filtering/routing/return path issue.

5) Verify “listening” from inside (if you have access)

If you can log into the server (console, out-of-band, or via a working management path), verify the service is actually bound to the expected IP/port.

# Linux examples (run on the server)
ss -lntp | grep -E ':80|:443'
# or
netstat -lntp | grep -E ':80|:443'

Not listening: fix the service (process down, wrong bind address, wrong interface, container not publishing port).
Listening but external tests time out: focus on security group/firewall/NACL/routing/return path.

Lab 3: Confirm Return Path Problems by Comparing Source IPs and Routes

Goal: detect asymmetric routing or missing return routes—cases where the destination receives traffic but replies go somewhere else (or are dropped). This is common when multiple interfaces, multiple gateways, NAT, or load balancers are involved.

Step-by-step

1) Identify the source IP as seen by the destination

From the client side, note your source public IP (or the NAT egress IP) as best you can.
On the server side (if you have logs), check what source IP is arriving. For HTTP, web server logs show client IP (or load balancer IP if not passing through headers). For raw TCP services, use connection tracking tools.

# On the server, observe incoming connections (Linux)
ss -tn state syn-recv '( sport = :443 )'
ss -tn state established '( sport = :443 )'

2) Compare tests from two different sources

Run the same port test from Source A and Source B (different networks).
Document: source IP, destination IP, port, result (timeout/refused/success), and time.

# From each source
nc -vz 203.0.113.10 443
curl -Iv https://example.com

3) If one source fails, check whether the server is replying via the expected route

On the server, verify the route back to the failing source IP uses the correct gateway/interface.

# On the server: show route selection back to a specific client IP
ip route get 198.51.100.25

# Show default route and main routes
ip route

Red flag: route to the client IP goes out a different interface/gateway than the one traffic arrived on, especially in multi-homed setups.
Red flag: default route points to an unexpected gateway (common after changes, DHCP overrides, or misconfigured cloud-init).

4) Validate path symmetry with traceroute from both ends (when possible)

Run traceroute from the client to the server IP.
If you can, run traceroute from the server back to the client public IP (or at least to the client’s NAT egress IP).

# Client to server
traceroute -T -p 443 203.0.113.10

# Server to client (if client IP is reachable)
traceroute 198.51.100.25

5) Decide and document the evidence

Return path problem indicators: server sees inbound SYNs but client never completes handshake; server route back to client points to wrong gateway; behavior differs by source network; packet captures (if available) show SYN arriving but SYN-ACK not leaving the correct interface.
Not a return path issue: server never sees inbound traffic at all (then focus on inbound filtering/routing before the server).

Practical Troubleshooting Notes You’ll Reuse Constantly

Use the right tool for the question

“Is the IP reachable and what’s the latency?” Use ping (but don’t treat ICMP failure as definitive).
“Where does the path appear to change or stop responding?” Use traceroute, preferably TCP-based to the service port.
“Is the service reachable on the required port and behaving correctly?” Use nc/tcping for connect tests and curl for HTTP/TLS behavior.

Write down observations in a repeatable format

# Example troubleshooting log entry format
Time: 2026-01-16 14:03 UTC
Source host: bastion-a (10.0.1.10), egress IP 198.51.100.25
Destination: example.com (203.0.113.10)
Test: nc -vz 203.0.113.10 443
Result: timeout after 5s
Notes: ping blocked (no replies); traceroute -T -p 443 stops after hop 7

This style of logging makes it much easier to escalate to a cloud/network team, compare before/after changes, and avoid “random walk” troubleshooting.

Now answer the exercise about the content:

A website loads normally over HTTPS, but ping to the same IP times out. What is the most appropriate interpretation and next step?

You are right! Congratulations, now go to the next page

You missed! Try again.

Ping uses ICMP and can be blocked or rate-limited even when TCP 80/443 works. Treat ping as reachability/latency data, then validate the actual service with port checks (nc/tcping) and application tests (curl).