Working with APIs, there are multiple reasons why you may not receive results when making a request (when the data exists). This can include network failures, timeouts, incomplete results, rate limits, server overload or throttling.
If your potential solution to these issues is to retry immediately, you risk hammering the API harder, worsening the problem.
A better solution is to retry intelligently – this is where exponential backoff with jitter comes in.
What is exponential backoff with jitter?
Exponential backoff means you wait progressively longer between retries. This gives the server more breathing room. However, this can lead to the ‘thundering herd‘ problem. Imagine 1000 clients who all encounter server issues at the same time, and each attempt to reconnect at exactly the same time intervals. This can then cause further instability.
A solution to this is to add jitter, which is where randomness is added to the intervals.
Here’s a simple implementation (in JavaScript):
const BASE_RETRY_DELAY_MS = 2000;
async function waitWithBackoff( attempt: number ) {
const delay = BASE_RETRY_DELAY_MS * Math.pow( 2, attempt - 1 ) + Math.random() * 1000;
await new Promise( res => setTimeout( res, delay ) );
}
In this case, Math.random() * 1000 is the jitter.
Retries in Context: Downloading Large API Datasets
To see this in context, below is some code that downloads data from an API that can return thousands of entries, but it may fail partway through or return empty batches.
We can combine multiple approaches here:
- Exponential backoff with jitter – to handle transient errors.
- Adaptive batch sizes – to shrink batch size after repeated server errors.
Adaptive Batch Sizes
Some API failures happen because the payload is too large. When we get repeated 500 errors from the server we can reduce the request size:
const MIN_PAGE_SIZE = 200;
if ( err.status === 500 && pageSize > MIN_PAGE_SIZE ) {
batchSize = Math.max( Math.floor( batchSize / 2 ), MIN_PAGE_SIZE );
}
If the server is under load, smaller requests have a better chance of succeeding. Bear in mind though that if the batches are too small, each request incurs more overhead and may even hit timeouts. The key is finding a balance between request size and reliability.
Putting It Together: A Resilient Loop
Here’s an abstracted control flow as an example:
for ( let attempt = 1; attempt <= MAX_DOWNLOAD_RETRIES; attempt++ ) {
try {
do {
// Fetch one batch of logs
// Retry batches with jittered backoff on failure
} while ( scrollId ); // while there’s more data to fetch
} catch {
// Retry the entire download attempt with backoff
await waitWithBackoff( attempt );
}
}
What this loop does:
- Fetches multiple batches of logs
- Retries individual batches on failure
- Uses jittered backoff between retries
- Shrinks the batch size if needed
- If a full download still fails, retries the loop a few times
Why this works
This approach works because it:
- Respects the API (doesn’t spam requests).
- Handles both transient and structural failures (network vs payload).
- Keeps the user informed (progress updates).
- Recovers gracefully without manual intervention.
Final thoughts
Combining jittered exponential backoff with it adaptive strategies such as adjusting page size, and you’ll be able to handle server flakiness more gracefully. That means a more reliable service, and that is always the aim.
Leave a Reply