Free Ebook cover Express.js Beyond Basics: Middleware, Architecture, and Maintainability

Express.js Beyond Basics: Middleware, Architecture, and Maintainability

New course

13 pages

Rate Limiting, Throttling, and Abuse Prevention

Capítulo 9

Estimated reading time: 9 minutes

+ Exercise

What rate limiting and throttling solve

Rate limiting and throttling are controls that reduce abuse and protect availability by restricting how frequently a client can perform actions. In practice, they help with:

  • Brute-force resistance on login/OTP/reset endpoints
  • Cost control for expensive endpoints (search, exports, AI calls)
  • Fair usage across tenants/users
  • Incident containment during traffic spikes or bot activity

Terminology you will see:

  • Rate limiting: enforce a maximum number of requests per time window (e.g., 100 per 15 minutes).
  • Throttling: slow down or shape traffic (e.g., allow bursts but refill tokens over time).
  • Abuse prevention: broader patterns including progressive enforcement, anomaly detection, and targeted limits on sensitive routes.

Strategy: global limits + per-route limits + identity-based keys

1) Global baseline limit

Apply a conservative global limit to all routes to protect the app from accidental floods and basic scraping. Keep it high enough not to impact normal users, but low enough to cap worst-case load.

2) Per-route limits for sensitive endpoints

Apply stricter limits to endpoints that are common abuse targets:

  • POST /auth/login, POST /auth/otp, POST /auth/reset
  • POST /signup (account creation)
  • GET /search (expensive queries)
  • POST /exports (heavy background jobs)

3) User/IP-based keys (and when to use each)

Choosing the key is the difference between blocking abusers and blocking everyone behind the same NAT.

Continue in our app.

You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.

Or continue reading below...
Download App

Download the app

  • IP-based: good for anonymous endpoints and early-stage protection. Risk: many legitimate users share an IP (corporate NAT, mobile carriers).
  • User-based: best after authentication. Use a stable identifier (user id, API key id, tenant id). This avoids punishing shared IPs.
  • Hybrid: combine user id when present, otherwise IP. Useful for endpoints used both anonymously and by logged-in users.

Implementation with express-rate-limit: configuration knobs

A common approach in Express is express-rate-limit. It supports standard headers, custom key generation, and pluggable stores.

Key knobs you should understand

OptionWhat it doesTypical choice
windowMsTime window for counting requests1 min, 5 min, 15 min
maxMaximum requests allowed in the windowGlobal: 300/5min; Login: 5/15min
standardHeadersSend RateLimit-* headerstrue
legacyHeadersSend X-RateLimit-* headersfalse
keyGeneratorHow to identify the clientIP, user id, API key, hybrid
handlerWhat happens when limit is exceededThrow/forward a consistent error
skipConditionally skip limitingHealth checks, internal traffic

Step-by-step: add a global baseline limiter

Install:

npm i express-rate-limit

Create a limiter and mount it early (before most routes):

import rateLimit from "express-rate-limit";

export const globalLimiter = rateLimit({
  windowMs: 5 * 60 * 1000,
  max: 300,
  standardHeaders: true,
  legacyHeaders: false,
});
import express from "express";
import { globalLimiter } from "./middleware/limiters.js";

const app = express();

app.use(globalLimiter);
// ...routes after this

This baseline limiter is intentionally not too strict; it mainly caps extreme behavior.

Step-by-step: stricter per-route limiter for auth endpoints

Create a dedicated limiter for login attempts:

import rateLimit from "express-rate-limit";

export const loginLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,
  max: 5,
  standardHeaders: true,
  legacyHeaders: false,
});

Apply it only to the route(s) that need it:

import { Router } from "express";
import { loginLimiter } from "../middleware/limiters.js";

const router = Router();

router.post("/login", loginLimiter, loginController);

export default router;

Keep the limiter close to the route definition so it is obvious which endpoints are protected and how.

Safe behavior behind proxies and load balancers

In production, Express often sits behind a reverse proxy (Nginx, ALB/ELB, Cloudflare). If you rate limit by IP, you must ensure Express sees the real client IP; otherwise you may rate limit the proxy IP and effectively throttle all users together.

Configure trust proxy correctly

Express uses X-Forwarded-For only when trust proxy is enabled. Configure it to match your deployment:

// If you have exactly one proxy hop (common with a single load balancer)
app.set("trust proxy", 1);

// Or, if you know the proxy IP range, prefer a stricter configuration
// app.set("trust proxy", "loopback, linklocal, uniquelocal");

Important: do not blindly set trust proxy to true unless you understand the security implications. If untrusted clients can spoof X-Forwarded-For, they can evade IP-based limits by sending fake headers.

Choose the right key when behind proxies

When using express-rate-limit, the default key is typically based on req.ip. With correct trust proxy configuration, req.ip will reflect the client IP. If you need a custom strategy, provide keyGenerator:

export const apiLimiter = rateLimit({
  windowMs: 60 * 1000,
  max: 120,
  standardHeaders: true,
  legacyHeaders: false,
  keyGenerator: (req) => {
    // Prefer authenticated identity; fallback to IP
    return req.user?.id ? `user:${req.user.id}` : `ip:${req.ip}`;
  },
});

Storage considerations: in-memory vs external store

Rate limiting requires a counter store. Where you store counters determines correctness and scalability.

In-memory store (default)

  • Pros: zero dependencies, fast, simple.
  • Cons: resets on restart, not shared across instances, inconsistent limits in multi-node deployments.

Use in-memory only for local development or single-instance deployments where occasional resets are acceptable.

External store (recommended for production)

  • Pros: shared across instances, survives restarts, consistent enforcement.
  • Cons: extra infrastructure, network latency, must handle store outages.

Common choices: Redis (most common), Memcached, or a managed rate-limiting service. With Redis, you typically use a store adapter compatible with express-rate-limit (for example, rate-limit-redis).

Example wiring (conceptual):

import rateLimit from "express-rate-limit";
import RedisStore from "rate-limit-redis";
import { createClient } from "redis";

const redisClient = createClient({ url: process.env.REDIS_URL });
await redisClient.connect();

export const globalLimiter = rateLimit({
  windowMs: 5 * 60 * 1000,
  max: 300,
  standardHeaders: true,
  legacyHeaders: false,
  store: new RedisStore({
    sendCommand: (...args) => redisClient.sendCommand(args),
  }),
});

Fail-open vs fail-closed when the store is down

Decide what happens if the external store is unavailable:

  • Fail-open: allow requests through to avoid outages caused by the limiter. Risk: abuse can spike during store downtime.
  • Fail-closed: block requests to protect the system. Risk: self-inflicted outage if the store is flaky.

Many teams choose fail-open for general endpoints but fail-closed (or stricter) for high-risk endpoints like login, depending on threat model.

Progressive enforcement: soft limits, warnings, and escalation

Hard blocking at the first threshold can create poor UX and makes it harder to tune limits. Progressive enforcement lets you observe and guide behavior before denying service.

Pattern: soft limit then hard limit

Use two thresholds:

  • Soft limit: after N requests, still allow but attach warnings (headers) and optionally log.
  • Hard limit: after M requests, return 429 Too Many Requests.

One way is to implement a small wrapper middleware that reads the current rate state and adds warnings. Some stores/limiters expose remaining counts; if not, you can implement a lightweight counter in the same store.

Example concept (pseudo-implementation):

function softLimit({ windowMs, softMax, hardMax, key }) {
  return async (req, res, next) => {
    const k = key(req);
    const { count, resetAt } = await counterStore.incr(k, windowMs);

    if (count >= softMax && count < hardMax) {
      res.setHeader("Warning", "199 - Approaching rate limit");
      res.setHeader("RateLimit-Policy", `${hardMax};w=${Math.floor(windowMs/1000)}`);
      // Optional: log for tuning and detection
    }

    if (count >= hardMax) {
      return next(new TooManyRequestsError({ resetAt }));
    }

    next();
  };
}

Soft limits are especially useful for APIs consumed by integrators: they get early signals to back off before they start receiving 429s.

Escalation: temporary bans and risk-based multipliers

For repeated offenders, you can escalate:

  • Temporary ban key (e.g., 10 minutes) after repeated 429s
  • Stricter limits for suspicious signals (new accounts, failed logins, unusual user agents)
  • Tenant-level caps to prevent one tenant from starving others

Keep escalation logic separate from business logic; treat it as policy middleware that can evolve.

Handling bursts: fixed window vs sliding window vs token bucket

Not all traffic patterns are steady. Some clients legitimately burst (page loads, mobile reconnects). Choose an algorithm that matches your needs.

Fixed window (simple)

  • Behavior: counts requests in a fixed interval (e.g., 0:00–0:59).
  • Issue: boundary problem (a client can send max at the end of one window and max at the start of the next).

Sliding window (smoother)

  • Behavior: counts over the last N seconds/minutes.
  • Benefit: reduces boundary bursts.

Token bucket / leaky bucket (burst-friendly throttling)

  • Behavior: tokens refill at a steady rate; requests consume tokens.
  • Benefit: allows short bursts while enforcing long-term rate.

If you need burst tolerance (e.g., allow 20 immediate requests but average 2/sec), token bucket is often the right mental model. In Express, you can implement this with Redis + Lua or use an upstream gateway that supports it.

Practical pattern: combine burst and sustained limits

Apply two limiters:

  • Burst limiter: short window, higher max (protects from sudden spikes)
  • Sustained limiter: longer window, lower average (protects from steady abuse)
// Example idea: burst + sustained
router.get(
  "/search",
  burstLimiter,      // e.g., 30 requests / 10 seconds
  sustainedLimiter,  // e.g., 300 requests / 10 minutes
  searchController
);

Consistent responses: integrate with centralized error handling

Even if you already have centralized error handling, rate limit middleware often responds directly. For consistency (same JSON shape, same error codes, same logging), prefer forwarding an error to your error handler.

Use a custom handler that calls next(err)

Configure the limiter to create a typed error and pass it along:

import rateLimit from "express-rate-limit";

class TooManyRequestsError extends Error {
  constructor({ message = "Too many requests", retryAfterSeconds } = {}) {
    super(message);
    this.name = "TooManyRequestsError";
    this.statusCode = 429;
    this.retryAfterSeconds = retryAfterSeconds;
    this.code = "RATE_LIMITED";
  }
}

export const strictLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,
  max: 5,
  standardHeaders: true,
  legacyHeaders: false,
  handler: (req, res, next, options) => {
    // options.windowMs is available; compute a conservative Retry-After
    const retryAfterSeconds = Math.ceil(options.windowMs / 1000);
    next(new TooManyRequestsError({ retryAfterSeconds }));
  },
});

Then, in your centralized error handler, you can map TooManyRequestsError to a consistent JSON response and set Retry-After if present:

// In your error handler (conceptual)
if (err.statusCode === 429) {
  if (err.retryAfterSeconds) res.setHeader("Retry-After", String(err.retryAfterSeconds));
  return res.status(429).json({
    error: { code: err.code || "RATE_LIMITED", message: err.message }
  });
}

Return useful headers for clients

With standardHeaders: true, clients can read:

  • RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset
  • Optionally Retry-After on 429

This enables well-behaved clients to implement backoff automatically.

Key design patterns for abuse prevention

Separate limits by category

Use different policies for different traffic types:

  • Public read endpoints: moderate global IP-based limits
  • Auth endpoints: strict IP-based + possibly device fingerprint; consider escalating on failed attempts
  • API key endpoints: key-based limits per API key and per tenant
  • Admin endpoints: very strict, often allowlisted networks

Use route-specific keys when needed

Sometimes you want to limit a specific action, not the whole client. Example: password reset should be limited per email + IP to prevent harassment.

export const resetLimiter = rateLimit({
  windowMs: 60 * 60 * 1000,
  max: 3,
  standardHeaders: true,
  legacyHeaders: false,
  keyGenerator: (req) => {
    const email = (req.body?.email || "").toLowerCase().trim();
    return `reset:${email}:ip:${req.ip}`;
  },
});

Be careful not to leak whether an email exists; the limiter key is internal, but your endpoint response should remain uniform.

Skip rules for internal traffic and health checks

Health checks can accidentally consume rate limits. Use skip for known internal paths or trusted sources:

export const globalLimiter = rateLimit({
  windowMs: 5 * 60 * 1000,
  max: 300,
  standardHeaders: true,
  legacyHeaders: false,
  skip: (req) => req.path === "/health" || req.path === "/metrics",
});

Observability: log limit events with context

For tuning and incident response, log:

  • key (or a hashed form), route, method
  • client IP and forwarded IP chain (if applicable)
  • user id / api key id (if authenticated)
  • limit policy name (global/login/search)

This helps you answer: “Are we blocking real users?” and “Which endpoints are being targeted?” without changing application logic.

Now answer the exercise about the content:

When an Express app is deployed behind a reverse proxy and you use IP-based rate limiting, what is the main reason to configure trust proxy correctly?

You are right! Congratulations, now go to the next page

You missed! Try again.

Behind a proxy, IP-based limiting relies on req.ip. With correct trust proxy settings, Express can use forwarded IPs safely, preventing the limiter from treating the proxy as the client and grouping all users under one IP.

Next chapter

Logging, Monitoring Signals, and Request Correlation

Arrow Right Icon
Download the app to earn free Certification and listen to the courses in the background, even with the screen off.