What rate limiting and throttling solve
Rate limiting and throttling are controls that reduce abuse and protect availability by restricting how frequently a client can perform actions. In practice, they help with:
- Brute-force resistance on login/OTP/reset endpoints
- Cost control for expensive endpoints (search, exports, AI calls)
- Fair usage across tenants/users
- Incident containment during traffic spikes or bot activity
Terminology you will see:
- Rate limiting: enforce a maximum number of requests per time window (e.g., 100 per 15 minutes).
- Throttling: slow down or shape traffic (e.g., allow bursts but refill tokens over time).
- Abuse prevention: broader patterns including progressive enforcement, anomaly detection, and targeted limits on sensitive routes.
Strategy: global limits + per-route limits + identity-based keys
1) Global baseline limit
Apply a conservative global limit to all routes to protect the app from accidental floods and basic scraping. Keep it high enough not to impact normal users, but low enough to cap worst-case load.
2) Per-route limits for sensitive endpoints
Apply stricter limits to endpoints that are common abuse targets:
POST /auth/login,POST /auth/otp,POST /auth/resetPOST /signup(account creation)GET /search(expensive queries)POST /exports(heavy background jobs)
3) User/IP-based keys (and when to use each)
Choosing the key is the difference between blocking abusers and blocking everyone behind the same NAT.
Continue in our app.
You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.
Or continue reading below...Download the app
- IP-based: good for anonymous endpoints and early-stage protection. Risk: many legitimate users share an IP (corporate NAT, mobile carriers).
- User-based: best after authentication. Use a stable identifier (user id, API key id, tenant id). This avoids punishing shared IPs.
- Hybrid: combine user id when present, otherwise IP. Useful for endpoints used both anonymously and by logged-in users.
Implementation with express-rate-limit: configuration knobs
A common approach in Express is express-rate-limit. It supports standard headers, custom key generation, and pluggable stores.
Key knobs you should understand
| Option | What it does | Typical choice |
|---|---|---|
windowMs | Time window for counting requests | 1 min, 5 min, 15 min |
max | Maximum requests allowed in the window | Global: 300/5min; Login: 5/15min |
standardHeaders | Send RateLimit-* headers | true |
legacyHeaders | Send X-RateLimit-* headers | false |
keyGenerator | How to identify the client | IP, user id, API key, hybrid |
handler | What happens when limit is exceeded | Throw/forward a consistent error |
skip | Conditionally skip limiting | Health checks, internal traffic |
Step-by-step: add a global baseline limiter
Install:
npm i express-rate-limitCreate a limiter and mount it early (before most routes):
import rateLimit from "express-rate-limit";
export const globalLimiter = rateLimit({
windowMs: 5 * 60 * 1000,
max: 300,
standardHeaders: true,
legacyHeaders: false,
});import express from "express";
import { globalLimiter } from "./middleware/limiters.js";
const app = express();
app.use(globalLimiter);
// ...routes after thisThis baseline limiter is intentionally not too strict; it mainly caps extreme behavior.
Step-by-step: stricter per-route limiter for auth endpoints
Create a dedicated limiter for login attempts:
import rateLimit from "express-rate-limit";
export const loginLimiter = rateLimit({
windowMs: 15 * 60 * 1000,
max: 5,
standardHeaders: true,
legacyHeaders: false,
});Apply it only to the route(s) that need it:
import { Router } from "express";
import { loginLimiter } from "../middleware/limiters.js";
const router = Router();
router.post("/login", loginLimiter, loginController);
export default router;Keep the limiter close to the route definition so it is obvious which endpoints are protected and how.
Safe behavior behind proxies and load balancers
In production, Express often sits behind a reverse proxy (Nginx, ALB/ELB, Cloudflare). If you rate limit by IP, you must ensure Express sees the real client IP; otherwise you may rate limit the proxy IP and effectively throttle all users together.
Configure trust proxy correctly
Express uses X-Forwarded-For only when trust proxy is enabled. Configure it to match your deployment:
// If you have exactly one proxy hop (common with a single load balancer)
app.set("trust proxy", 1);
// Or, if you know the proxy IP range, prefer a stricter configuration
// app.set("trust proxy", "loopback, linklocal, uniquelocal");Important: do not blindly set trust proxy to true unless you understand the security implications. If untrusted clients can spoof X-Forwarded-For, they can evade IP-based limits by sending fake headers.
Choose the right key when behind proxies
When using express-rate-limit, the default key is typically based on req.ip. With correct trust proxy configuration, req.ip will reflect the client IP. If you need a custom strategy, provide keyGenerator:
export const apiLimiter = rateLimit({
windowMs: 60 * 1000,
max: 120,
standardHeaders: true,
legacyHeaders: false,
keyGenerator: (req) => {
// Prefer authenticated identity; fallback to IP
return req.user?.id ? `user:${req.user.id}` : `ip:${req.ip}`;
},
});Storage considerations: in-memory vs external store
Rate limiting requires a counter store. Where you store counters determines correctness and scalability.
In-memory store (default)
- Pros: zero dependencies, fast, simple.
- Cons: resets on restart, not shared across instances, inconsistent limits in multi-node deployments.
Use in-memory only for local development or single-instance deployments where occasional resets are acceptable.
External store (recommended for production)
- Pros: shared across instances, survives restarts, consistent enforcement.
- Cons: extra infrastructure, network latency, must handle store outages.
Common choices: Redis (most common), Memcached, or a managed rate-limiting service. With Redis, you typically use a store adapter compatible with express-rate-limit (for example, rate-limit-redis).
Example wiring (conceptual):
import rateLimit from "express-rate-limit";
import RedisStore from "rate-limit-redis";
import { createClient } from "redis";
const redisClient = createClient({ url: process.env.REDIS_URL });
await redisClient.connect();
export const globalLimiter = rateLimit({
windowMs: 5 * 60 * 1000,
max: 300,
standardHeaders: true,
legacyHeaders: false,
store: new RedisStore({
sendCommand: (...args) => redisClient.sendCommand(args),
}),
});Fail-open vs fail-closed when the store is down
Decide what happens if the external store is unavailable:
- Fail-open: allow requests through to avoid outages caused by the limiter. Risk: abuse can spike during store downtime.
- Fail-closed: block requests to protect the system. Risk: self-inflicted outage if the store is flaky.
Many teams choose fail-open for general endpoints but fail-closed (or stricter) for high-risk endpoints like login, depending on threat model.
Progressive enforcement: soft limits, warnings, and escalation
Hard blocking at the first threshold can create poor UX and makes it harder to tune limits. Progressive enforcement lets you observe and guide behavior before denying service.
Pattern: soft limit then hard limit
Use two thresholds:
- Soft limit: after N requests, still allow but attach warnings (headers) and optionally log.
- Hard limit: after M requests, return
429 Too Many Requests.
One way is to implement a small wrapper middleware that reads the current rate state and adds warnings. Some stores/limiters expose remaining counts; if not, you can implement a lightweight counter in the same store.
Example concept (pseudo-implementation):
function softLimit({ windowMs, softMax, hardMax, key }) {
return async (req, res, next) => {
const k = key(req);
const { count, resetAt } = await counterStore.incr(k, windowMs);
if (count >= softMax && count < hardMax) {
res.setHeader("Warning", "199 - Approaching rate limit");
res.setHeader("RateLimit-Policy", `${hardMax};w=${Math.floor(windowMs/1000)}`);
// Optional: log for tuning and detection
}
if (count >= hardMax) {
return next(new TooManyRequestsError({ resetAt }));
}
next();
};
}Soft limits are especially useful for APIs consumed by integrators: they get early signals to back off before they start receiving 429s.
Escalation: temporary bans and risk-based multipliers
For repeated offenders, you can escalate:
- Temporary ban key (e.g., 10 minutes) after repeated 429s
- Stricter limits for suspicious signals (new accounts, failed logins, unusual user agents)
- Tenant-level caps to prevent one tenant from starving others
Keep escalation logic separate from business logic; treat it as policy middleware that can evolve.
Handling bursts: fixed window vs sliding window vs token bucket
Not all traffic patterns are steady. Some clients legitimately burst (page loads, mobile reconnects). Choose an algorithm that matches your needs.
Fixed window (simple)
- Behavior: counts requests in a fixed interval (e.g., 0:00–0:59).
- Issue: boundary problem (a client can send max at the end of one window and max at the start of the next).
Sliding window (smoother)
- Behavior: counts over the last N seconds/minutes.
- Benefit: reduces boundary bursts.
Token bucket / leaky bucket (burst-friendly throttling)
- Behavior: tokens refill at a steady rate; requests consume tokens.
- Benefit: allows short bursts while enforcing long-term rate.
If you need burst tolerance (e.g., allow 20 immediate requests but average 2/sec), token bucket is often the right mental model. In Express, you can implement this with Redis + Lua or use an upstream gateway that supports it.
Practical pattern: combine burst and sustained limits
Apply two limiters:
- Burst limiter: short window, higher max (protects from sudden spikes)
- Sustained limiter: longer window, lower average (protects from steady abuse)
// Example idea: burst + sustained
router.get(
"/search",
burstLimiter, // e.g., 30 requests / 10 seconds
sustainedLimiter, // e.g., 300 requests / 10 minutes
searchController
);Consistent responses: integrate with centralized error handling
Even if you already have centralized error handling, rate limit middleware often responds directly. For consistency (same JSON shape, same error codes, same logging), prefer forwarding an error to your error handler.
Use a custom handler that calls next(err)
Configure the limiter to create a typed error and pass it along:
import rateLimit from "express-rate-limit";
class TooManyRequestsError extends Error {
constructor({ message = "Too many requests", retryAfterSeconds } = {}) {
super(message);
this.name = "TooManyRequestsError";
this.statusCode = 429;
this.retryAfterSeconds = retryAfterSeconds;
this.code = "RATE_LIMITED";
}
}
export const strictLimiter = rateLimit({
windowMs: 15 * 60 * 1000,
max: 5,
standardHeaders: true,
legacyHeaders: false,
handler: (req, res, next, options) => {
// options.windowMs is available; compute a conservative Retry-After
const retryAfterSeconds = Math.ceil(options.windowMs / 1000);
next(new TooManyRequestsError({ retryAfterSeconds }));
},
});Then, in your centralized error handler, you can map TooManyRequestsError to a consistent JSON response and set Retry-After if present:
// In your error handler (conceptual)
if (err.statusCode === 429) {
if (err.retryAfterSeconds) res.setHeader("Retry-After", String(err.retryAfterSeconds));
return res.status(429).json({
error: { code: err.code || "RATE_LIMITED", message: err.message }
});
}Return useful headers for clients
With standardHeaders: true, clients can read:
RateLimit-Limit,RateLimit-Remaining,RateLimit-Reset- Optionally
Retry-Afteron 429
This enables well-behaved clients to implement backoff automatically.
Key design patterns for abuse prevention
Separate limits by category
Use different policies for different traffic types:
- Public read endpoints: moderate global IP-based limits
- Auth endpoints: strict IP-based + possibly device fingerprint; consider escalating on failed attempts
- API key endpoints: key-based limits per API key and per tenant
- Admin endpoints: very strict, often allowlisted networks
Use route-specific keys when needed
Sometimes you want to limit a specific action, not the whole client. Example: password reset should be limited per email + IP to prevent harassment.
export const resetLimiter = rateLimit({
windowMs: 60 * 60 * 1000,
max: 3,
standardHeaders: true,
legacyHeaders: false,
keyGenerator: (req) => {
const email = (req.body?.email || "").toLowerCase().trim();
return `reset:${email}:ip:${req.ip}`;
},
});Be careful not to leak whether an email exists; the limiter key is internal, but your endpoint response should remain uniform.
Skip rules for internal traffic and health checks
Health checks can accidentally consume rate limits. Use skip for known internal paths or trusted sources:
export const globalLimiter = rateLimit({
windowMs: 5 * 60 * 1000,
max: 300,
standardHeaders: true,
legacyHeaders: false,
skip: (req) => req.path === "/health" || req.path === "/metrics",
});Observability: log limit events with context
For tuning and incident response, log:
- key (or a hashed form), route, method
- client IP and forwarded IP chain (if applicable)
- user id / api key id (if authenticated)
- limit policy name (global/login/search)
This helps you answer: “Are we blocking real users?” and “Which endpoints are being targeted?” without changing application logic.