What Is API Rate Limiting?

API rate limiting is a server-side control that caps how many requests a client can make within a defined time window. If you exceed the cap, the server rejects additional requests — usually with an HTTP 429 Too Many Requests response — until the window resets.

Rate limits protect services from overload and abuse, and they allow providers to offer tiered pricing where higher-volume plans cost more.

Common rate limit formats

Rate limits are expressed as a count plus a window:

Format	Example	Equivalent RPS
Per second	10 req/s	10.00
Per minute	100 req/min	1.67
Per hour	1 000 req/hr	0.28
Per day	10 000 req/day	0.12
Per month	1 000 000 req/month	0.39

Use the API Rate Limit Calculator to convert any format into all the others instantly.

How rate limiting works

Most APIs use one of three window strategies:

Fixed window. The counter resets at a clock boundary (e.g. the top of every minute). Simple to implement but can allow a burst of 2× the limit at the boundary — 100 requests at 11:59:59 and 100 more at 12:00:00.

Sliding window. The server tracks request timestamps and only counts requests within the trailing period. More precise, no boundary burst, but slightly more expensive to implement.

Token bucket / leaky bucket. Requests consume tokens from a bucket that refills at a steady rate. Allows short bursts (up to the bucket size) while enforcing a long-term average. This is the most flexible model and is used by APIs like Stripe and GitHub.

What happens when you hit the limit

When a request is rejected, the server returns HTTP 429 and usually includes headers telling you how long to wait:

HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1715433600

Retry-After is in seconds; X-RateLimit-Reset is a Unix timestamp. Check the API docs — header names vary by provider.

Staying under the limit

Simple delay

If your RPS limit is r, insert a minimum delay between requests:

const delay_ms = 1000 / r;
await sleep(delay_ms);

This works for steady sequential workloads. It does not handle bursts.

Token bucket (client-side)

Maintain a counter of available tokens. Each request consumes one token; tokens refill at the rate limit. If the bucket is empty, queue the request until tokens are available. Libraries like p-ratelimit (Node.js) implement this out of the box.

Exponential back-off on 429

When you receive a 429, wait and retry with increasing delay:

async function fetchWithRetry(url, retries = 5, backoff = 500) {
  for (let i = 0; i < retries; i++) {
    const res = await fetch(url);
    if (res.status !== 429) return res;
    await sleep(backoff * 2 ** i);
  }
  throw new Error("Rate limit exceeded after retries");
}

Planning your quota

Before writing code, convert your rate limit to the periods that matter for your use case.

Example: a free plan offers 100 req/min. Is that enough for a chatbot serving 500 users per day?

100 req/min × 60 min × 12 peak hours = 72,000 requests/day
500 users × ~5 requests per session = 2,500 requests/day
Conclusion: the free plan has 29× headroom at this volume.

Use the API Rate Limit Calculator to run these numbers for your own plan and usage pattern.

Rate limits vs throttling

Concept	Behaviour
Rate limiting	Requests over the cap are rejected (HTTP 429)
Throttling	Requests over the cap are slowed down — responses come, just more slowly

Both are server-enforced; the quota math is identical. The difference is only in how the server handles excess traffic.

Useful headers to log

Even on successful requests, rate-limit headers tell you how much headroom you have:

X-RateLimit-Remaining — requests left in the current window
X-RateLimit-Reset — when the window resets (Unix timestamp or seconds)
Retry-After — seconds to wait before retrying (on 429)

Log these alongside your normal request metadata so you can spot when you're approaching the limit before you hit it.