Free Ebook cover Web Servers 101: How HTTP, DNS, TLS, and Reverse Proxies Work

Web Servers 101: How HTTP, DNS, TLS, and Reverse Proxies Work

New course

13 pages

URLs and Origins: Scheme, Host, Port, Path, Query, and Fragments

Capítulo 2

Estimated reading time: 13 minutes

+ Exercise

Why URLs and Origins Matter

A URL (Uniform Resource Locator) is the structured address you give to a browser, an API client, or a server-side HTTP library to identify a resource and how to reach it. Even small differences in a URL can change which server is contacted, which application receives the request, which handler runs, what data is returned, and what security rules apply in the browser.

An origin is a security boundary used primarily by browsers. Many web platform rules (for example, whether JavaScript can read a response, whether cookies are sent, and whether storage is shared) are evaluated at the origin level. Understanding exactly how a URL is broken into parts and how an origin is derived from it helps you debug issues like “request goes to the wrong place,” “cookies not sent,” “CORS blocked,” “redirect loop,” and “cache misses.”

The Main Parts of a URL

Most URLs you use for HTTP services follow this general shape:

scheme://host:port/path?query#fragment

Not every part is always present, but the order and separators are consistent. Let’s define each piece and what it influences.

Scheme

The scheme tells the client what protocol to use and how to interpret the rest of the URL. For web traffic, the most common schemes are http and https.

Continue in our app.

You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.

Or continue reading below...
Download App

Download the app

  • http: Unencrypted HTTP over TCP (commonly port 80 by default).
  • https: HTTP over TLS (commonly port 443 by default). The scheme affects security expectations and browser behavior (for example, many APIs require a secure context).

Other schemes exist (for example, ws/wss for WebSocket, ftp, file, mailto), but in a web server context you’ll mostly reason about http and https.

Important practical detail: changing only the scheme from http to https changes the origin, changes default port assumptions, and changes how cookies with the Secure attribute behave.

Host

The host identifies the server you intend to reach. It is typically a domain name (like api.example.com) but can also be an IP address (like 203.0.113.10 or [2001:db8::10] for IPv6).

Hosts are case-insensitive in practice for DNS names, but you should treat them as normalized to lowercase to avoid subtle mismatches in logging, caching, or security checks.

Hosts can include subdomains, and subdomains often map to different applications or environments:

  • www.example.com might serve the marketing site.
  • app.example.com might serve the web app.
  • api.example.com might serve JSON APIs.

In HTTP/1.1 and HTTP/2, the host is also conveyed in the request (via the Host header in HTTP/1.1, and via :authority in HTTP/2). This is what enables virtual hosting: multiple sites on the same IP address.

Port

The port selects which service on the host you want to talk to. If you omit the port, the client uses the scheme’s default:

  • http defaults to port 80
  • https defaults to port 443

When you explicitly include a port, it becomes part of the origin and can affect routing and security decisions. For example:

  • https://example.com and https://example.com:443 are equivalent in meaning, but many systems normalize away the explicit default port.
  • https://example.com:8443 is a different origin from https://example.com.

Ports are especially common in development and internal services:

  • http://localhost:3000 for a frontend dev server
  • http://localhost:8080 for a backend API

Path

The path identifies a resource within the server/application. It begins with / and can contain multiple segments:

/products/123/reviews

From the server’s perspective, the path is commonly used for routing (mapping to a controller/handler) and sometimes for selecting static files. From a reverse proxy’s perspective, the path can be used for path-based routing (for example, send /api to one upstream and / to another).

Paths are generally case-sensitive on the web (even if some backends treat them differently). Treat /Users and /users as potentially different routes.

Two practical details that often cause bugs:

  • Trailing slash: /docs and /docs/ are different paths. Many frameworks redirect one to the other, but you should not assume they are equivalent.
  • Dot segments: /a/b/../c can be normalized to /a/c. Clients and proxies may normalize, but don’t rely on inconsistent behavior for security checks.

Query

The query begins after ? and contains additional parameters. It is commonly used for filtering, pagination, tracking, and optional inputs:

?page=2&sort=price&in_stock=true

Query parameters are part of the URL and are sent to the server as part of the request target. They often influence caching and routing logic. Many caches treat different query strings as different cache keys, but some CDNs allow you to ignore certain parameters (for example, ignore utm_* tracking parameters).

Important: the query is not inherently ordered semantically, but many systems treat it as a raw string. That means ?a=1&b=2 and ?b=2&a=1 may be equivalent for your application logic but different for caching layers or signature verification schemes.

Fragment

The fragment begins after # and is used by the client (typically the browser) to refer to a portion of a document or a client-side route state:

#section-3

Key rule: for normal HTTP requests, the fragment is not sent to the server. It is processed client-side. This is why you cannot rely on fragments for server-side routing, authentication, or logging. If you see a fragment in the address bar, your server will not see it in the request line.

Fragments are commonly used for:

  • Scrolling to an element with a matching id in HTML documents.
  • Single-page applications that use hash-based routing (for example, https://example.com/#/settings).

Origins: The Browser’s Security Grouping

An origin is defined as the tuple:

(scheme, host, port)

Path, query, and fragment are not part of the origin.

Examples:

  • https://example.com/account has origin https://example.com:443 (port implied).
  • https://example.com:8443/account has origin https://example.com:8443.
  • http://example.com has origin http://example.com:80.
  • https://api.example.com is a different origin from https://example.com because the host differs.

Why this matters: browsers isolate many capabilities by origin. If you load a page from one origin and try to fetch data from another origin, the browser may restrict access unless the server explicitly allows it. Similarly, cookies and storage are scoped in ways that often align with origin boundaries (with some cookie rules using domain scoping, which is related but not identical).

Same-Origin vs Same-Site (Don’t Confuse Them)

Same-origin means scheme, host, and port all match. Same-site is a related concept used for cookie and request context decisions and is based on registrable domain (for example, app.example.com and api.example.com are often considered same-site but not same-origin). You will frequently see issues where something “works across subdomains” for cookies but still fails for JavaScript access due to same-origin rules.

Step-by-Step: Parse a URL Like a Debugger

When you’re troubleshooting, don’t eyeball a URL. Parse it systematically and write down each component.

Example 1

https://api.example.com:8443/v1/users/42?expand=teams&limit=10#profile
  • Scheme: https
  • Host: api.example.com
  • Port: 8443 (explicit)
  • Path: /v1/users/42
  • Query: expand=teams&limit=10
  • Fragment: profile
  • Origin: https://api.example.com:8443

Debug implications:

  • If a browser page from https://app.example.com calls this API, it is cross-origin (host differs and port differs).
  • The server will receive the path and query, but not the fragment.

Example 2 (default port)

http://localhost/test?x=1
  • Scheme: http
  • Host: localhost
  • Port: 80 (implicit default)
  • Path: /test
  • Query: x=1
  • Origin: http://localhost:80

Debug implication: if your dev server is actually listening on 3000, this URL will not reach it. You must use http://localhost:3000/test?x=1.

How Each Component Affects Server Routing and Reverse Proxies

Scheme and reverse proxies

Even if your application only listens on plain HTTP behind a reverse proxy, the external URL may be https. Many frameworks need to know the “original scheme” to generate correct absolute URLs, redirects, and cookie attributes. If the app thinks the scheme is http when the user is actually on https, you may see:

  • Redirects to http:// (downgrading security or causing mixed content problems).
  • Cookies missing the Secure attribute when they should have it.

Practically, this is why deployments often forward a header indicating the external scheme (commonly X-Forwarded-Proto) and configure the app to trust it only from known proxies.

Host and virtual hosting

On a single IP, a reverse proxy can host multiple domains. The host determines which site configuration is used. If the host header is wrong, you can land on the wrong site or get a default “unknown host” response.

Practical debugging checklist when a request hits the wrong app:

  • Verify the URL host matches the intended domain.
  • Verify the request’s Host header (or :authority) matches.
  • Check reverse proxy routing rules that match on host.

Port and environment separation

Ports are frequently used to separate environments or services on the same host. If you run multiple services on one machine, the port is the primary selector. In containerized setups, you also have to distinguish between container port and published host port; the URL must use the published port.

Path-based routing

Reverse proxies and API gateways often route by path prefix:

  • /api/ goes to an API service
  • /static/ goes to a static file server or CDN origin
  • / goes to the frontend app

Small path differences can break routing. For example, if the proxy expects /api/ but the client calls /API/, a case-sensitive match may fail and route to the wrong backend.

Query Strings in Practice: Encoding, Repetition, and Safety

Percent-encoding basics

URLs are limited to a subset of ASCII characters in their raw form. When you need to include spaces or reserved characters, they must be encoded. For example, a space in a query value is commonly encoded as %20 (or + in some form-encoding contexts):

?q=hello%20world

Reserved characters like & and = have special meaning in query strings, so if they are part of a value, they must be encoded:

?note=fish%26chips

Repeated keys and arrays

Many APIs accept repeated keys:

?tag=red&tag=blue&tag=green

Others use bracket conventions:

?tag[]=red&tag[]=blue

There is no single universal standard for how servers interpret these. When you design an API, document the expected format and test it with your framework’s parser.

Query strings and sensitive data

Because query strings are part of the URL, they often end up in logs, browser history, bookmarks, monitoring tools, and referrer headers (depending on referrer policy). Avoid putting secrets (API keys, passwords, one-time tokens) in query parameters. Prefer headers or request bodies for sensitive values.

Fragments in Practice: Client-Side Only, But Still Important

Although fragments are not sent to the server, they can still affect user experience and client-side routing. Two common patterns:

  • Document anchors: https://example.com/docs#install scrolls to the element with id="install".
  • Hash routing: https://example.com/#/settings/profile lets a single HTML page handle multiple “routes” without server involvement.

Debugging tip: if you see a 404 from the server for a single-page app route like /settings/profile, switching to hash routing can avoid server configuration changes, but it changes URL semantics and may affect analytics and SEO. Alternatively, configure the server to serve the SPA entry point for unknown paths.

Common URL Normalization Pitfalls

Default ports and origin comparisons

When comparing origins, normalize default ports. A browser treats https://example.com as the same origin as https://example.com:443. But some application code compares strings and mistakenly treats them as different. Prefer using a URL parser and comparing structured components.

Trailing slashes and redirects

If your server redirects /docs to /docs/, that redirect can change relative URL resolution in the browser. For example, relative links behave differently depending on whether the base path ends with a slash. When you see broken relative asset paths, check whether the page URL ends with / and whether a redirect occurred.

Case sensitivity

Hosts are effectively case-insensitive, but paths are generally case-sensitive. A link to /Images/logo.png may work on a case-insensitive filesystem in development but fail in production on a case-sensitive filesystem or router.

Internationalized domain names (IDN)

Some domain names contain non-ASCII characters. Internally, they are represented using punycode (an ASCII encoding). Most modern clients handle this automatically, but logs and security filters may see the punycode form. If you do allow user-supplied URLs, be careful about look-alike characters and validate/normalize domains before applying allowlists.

Hands-On: Determine Whether Two URLs Are Same-Origin

Use this repeatable process whenever you’re unsure whether browser same-origin rules apply.

Step-by-step checklist

  • Parse both URLs into scheme, host, and port.
  • If a port is missing, substitute the default for the scheme (80 for http, 443 for https).
  • Compare scheme, host, and port. All three must match for same-origin.
  • Ignore path, query, and fragment for the origin decision.

Practice comparisons

  • https://example.com/a vs https://example.com/b: same-origin (path differs only).
  • https://example.com vs http://example.com: different origin (scheme differs).
  • https://example.com vs https://www.example.com: different origin (host differs).
  • https://example.com vs https://example.com:8443: different origin (port differs).
  • https://example.com:443 vs https://example.com: same-origin (default port normalization).

Hands-On: Build Correct URLs in Code (Avoid String Concatenation Bugs)

Many production bugs come from manually concatenating strings to form URLs (double slashes, missing encoding, incorrect query separators). Prefer a URL builder in your language.

Example: JavaScript (URL and URLSearchParams)

const url = new URL('https://api.example.com/v1/search');url.searchParams.set('q', 'hello world');url.searchParams.set('limit', '10');console.log(url.toString());// https://api.example.com/v1/search?q=hello+world&limit=10

What this gives you:

  • Correct encoding of spaces and reserved characters.
  • Correct placement of ? and &.
  • A structured way to read and modify components.

Example: Python (urllib.parse)

from urllib.parse import urlparse, urlencode, urlunparseparsed = urlparse('https://example.com/items')query = urlencode({'page': 2, 'q': 'fish & chips'})built = urlunparse((parsed.scheme, parsed.netloc, parsed.path, '', query, ''))print(built)# https://example.com/items?page=2&q=fish+%26+chips

Notice how & inside the value becomes %26, preventing it from being misread as a parameter separator.

Hands-On: Understand Relative URLs and Base Paths

Browsers resolve relative links based on the current document URL. This is mostly a client-side concern, but it affects how you structure paths on the server and how redirects behave.

Step-by-step resolution examples

Assume the current page is:

https://example.com/docs/guide/index.html
  • Relative link images/logo.png resolves to https://example.com/docs/guide/images/logo.png
  • Relative link /images/logo.png resolves to https://example.com/images/logo.png (leading slash means “from the origin root”)
  • Relative link ../api resolves to https://example.com/docs/api

Practical debugging: if assets fail to load after you move a page deeper into a path hierarchy, check whether your HTML uses root-relative paths (/assets/app.css) or document-relative paths (assets/app.css).

Quick Reference: What Reaches the Server?

When a browser makes an HTTP request, the server-side application typically receives:

  • Scheme: not directly in the request line, but it is implied by the connection (or forwarded by a proxy).
  • Host: via Host / :authority.
  • Port: implied by the connection and sometimes visible via proxy headers.
  • Path: yes.
  • Query: yes.
  • Fragment: no.

This mental model helps you quickly answer questions like “why doesn’t my server see #token=...?” and “why does my app generate the wrong absolute URL behind a proxy?”

Now answer the exercise about the content:

A browser loads https://example.com/page and runs JavaScript that fetches https://example.com:8443/data. Which statement best describes the relationship between these two URLs in terms of origin?

You are right! Congratulations, now go to the next page

You missed! Try again.

An origin is defined by scheme, host, and port. Changing the port (for example, from implicit 443 to 8443) creates a different origin, even if the scheme and host stay the same.

Next chapter

DNS Resolution: How Names Become IP Addresses

Arrow Right Icon
Download the app to earn free Certification and listen to the courses in the background, even with the screen off.