Utilix knowledge base
What Is API Rate Limiting?
Published May 11, 2026
API rate limiting is a server-side control that caps how many requests a client can make within a defined time window. If you exceed the cap, the server rejects additional requests — usually with an HTTP 429 Too Many Requests response — until the window resets.
Rate limits protect services from overload and abuse, and they allow providers to offer tiered pricing where higher-volume plans cost more.
Common rate limit formats
Rate limits are expressed as a count plus a window:
| Format | Example | Equivalent RPS |
|---|---|---|
| Per second | 10 req/s | 10.00 |
| Per minute | 100 req/min | 1.67 |
| Per hour | 1 000 req/hr | 0.28 |
| Per day | 10 000 req/day | 0.12 |
| Per month | 1 000 000 req/month | 0.39 |
Use the API Rate Limit Calculator to convert any format into all the others instantly.
How rate limiting works
Most APIs use one of three window strategies:
Fixed window. The counter resets at a clock boundary (e.g. the top of every minute). Simple to implement but can allow a burst of 2× the limit at the boundary — 100 requests at 11:59:59 and 100 more at 12:00:00.
Sliding window. The server tracks request timestamps and only counts requests within the trailing period. More precise, no boundary burst, but slightly more expensive to implement.
Token bucket / leaky bucket. Requests consume tokens from a bucket that refills at a steady rate. Allows short bursts (up to the bucket size) while enforcing a long-term average. This is the most flexible model and is used by APIs like Stripe and GitHub.
What happens when you hit the limit
When a request is rejected, the server returns HTTP 429 and usually includes headers telling you how long to wait:
HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1715433600
Retry-After is in seconds; X-RateLimit-Reset is a Unix timestamp. Check the API docs — header names vary by provider.
Staying under the limit
Simple delay
If your RPS limit is r, insert a minimum delay between requests:
const delay_ms = 1000 / r;
await sleep(delay_ms);
This works for steady sequential workloads. It does not handle bursts.
Token bucket (client-side)
Maintain a counter of available tokens. Each request consumes one token; tokens refill at the rate limit. If the bucket is empty, queue the request until tokens are available. Libraries like p-ratelimit (Node.js) implement this out of the box.
Exponential back-off on 429
When you receive a 429, wait and retry with increasing delay:
async function fetchWithRetry(url, retries = 5, backoff = 500) {
for (let i = 0; i < retries; i++) {
const res = await fetch(url);
if (res.status !== 429) return res;
await sleep(backoff * 2 ** i);
}
throw new Error("Rate limit exceeded after retries");
}
Planning your quota
Before writing code, convert your rate limit to the periods that matter for your use case.
Example: a free plan offers 100 req/min. Is that enough for a chatbot serving 500 users per day?
- 100 req/min × 60 min × 12 peak hours = 72,000 requests/day
- 500 users × ~5 requests per session = 2,500 requests/day
- Conclusion: the free plan has 29× headroom at this volume.
Use the API Rate Limit Calculator to run these numbers for your own plan and usage pattern.
Rate limits vs throttling
| Concept | Behaviour |
|---|---|
| Rate limiting | Requests over the cap are rejected (HTTP 429) |
| Throttling | Requests over the cap are slowed down — responses come, just more slowly |
Both are server-enforced; the quota math is identical. The difference is only in how the server handles excess traffic.
Useful headers to log
Even on successful requests, rate-limit headers tell you how much headroom you have:
X-RateLimit-Remaining— requests left in the current windowX-RateLimit-Reset— when the window resets (Unix timestamp or seconds)Retry-After— seconds to wait before retrying (on 429)
Log these alongside your normal request metadata so you can spot when you're approaching the limit before you hit it.