Building Resilient API Downloads with Jitter Backoff and Smart Retries

Working with APIs, there are multiple reasons why you may not receive results when making a request (when the data exists). This can include network failures, timeouts, incomplete results, rate limits, server overload or throttling.

If your potential solution to these issues is to retry immediately, you risk hammering the API harder, worsening the problem.

A better solution is to retry intelligently – this is where exponential backoff with jitter comes in.

What is exponential backoff with jitter?

Exponential backoff means you wait progressively longer between retries. This gives the server more breathing room. However, this can lead to the ‘thundering herd‘ problem. Imagine 1000 clients who all encounter server issues at the same time, and each attempt to reconnect at exactly the same time intervals. This can then cause further instability.

A solution to this is to add jitter, which is where randomness is added to the intervals.

Here’s a simple implementation (in JavaScript):

const BASE_RETRY_DELAY_MS = 2000;

async function waitWithBackoff( attempt: number ) {
	const delay = BASE_RETRY_DELAY_MS * Math.pow( 2, attempt - 1 ) + Math.random() * 1000;
	await new Promise( res => setTimeout( res, delay ) );
}

In this case, Math.random() * 1000 is the jitter.

Retries in Context: Downloading Large API Datasets

To see this in context, below is some code that downloads data from an API that can return thousands of entries, but it may fail partway through or return empty batches.

We can combine multiple approaches here:

  1. Exponential backoff with jitter – to handle transient errors.
  2. Adaptive batch sizes – to shrink batch size after repeated server errors.

Adaptive Batch Sizes

Some API failures happen because the payload is too large. When we get repeated 500 errors from the server we can reduce the request size:

const MIN_PAGE_SIZE = 200;

if ( err.status === 500 && pageSize > MIN_PAGE_SIZE ) {
	batchSize = Math.max( Math.floor( batchSize / 2 ), MIN_PAGE_SIZE );
}

If the server is under load, smaller requests have a better chance of succeeding. Bear in mind though that if the batches are too small, each request incurs more overhead and may even hit timeouts. The key is finding a balance between request size and reliability.

Putting It Together: A Resilient Loop

Here’s an abstracted control flow as an example:

for ( let attempt = 1; attempt <= MAX_DOWNLOAD_RETRIES; attempt++ ) {
	try {
		do {
			// Fetch one batch of logs
			// Retry batches with jittered backoff on failure
		} while ( scrollId ); // while there’s more data to fetch
	} catch {
		// Retry the entire download attempt with backoff
		await waitWithBackoff( attempt );
	}
}

What this loop does:

  • Fetches multiple batches of logs
  • Retries individual batches on failure
  • Uses jittered backoff between retries
  • Shrinks the batch size if needed
  • If a full download still fails, retries the loop a few times

Why this works

This approach works because it:

  • Respects the API (doesn’t spam requests).
  • Handles both transient and structural failures (network vs payload).
  • Keeps the user informed (progress updates).
  • Recovers gracefully without manual intervention.

Final thoughts


Combining jittered exponential backoff with it adaptive strategies such as adjusting page size, and you’ll be able to handle server flakiness more gracefully. That means a more reliable service, and that is always the aim.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from Karen Attfield

Subscribe now to keep reading and get access to the full archive.

Continue reading