Implementing auto-retry policies in web scraping is crucial for maintaining data accuracy and reliability, especially since achieving a 100% success rate is difficult. Factors such as site downtimes or interrupted connections can cause occasional failures.

However, with ZenRows®, you are not charged for these failed attempts. For instance, if you make 130 requests and 15 need to be retried, you will only be charged for the 115 successful ones, providing a cost-effective solution.

Below, we offer detailed solutions in Python and JavaScript to help you implement these retry policies. If you require support for other languages or frameworks, please feel free to contact us.

ZenRows® Python SDK and JavaScript SDK offer built-in support for retries. Simply pass a retries parameter in the constructor to enable this feature and enhance your scraping efficiency.

Python with Requests

To begin, ensure you have Python 3 installed on your system. Some systems come with it pre-installed. Once Python is ready, install the necessary libraries by running the following command:

pip install requests

We will use the Retry class from urllib3 and the HTTPAdapter from requests. These are part of the requests library, so you don’t need to install them separately.

Instead of making direct get calls, we create a requests session. This approach allows us to set up and reuse the session configuration, including retry settings, for all subsequent requests. This setup is efficient and ensures consistent retry behavior across all requests.

You can adjust the Retry parameters to suit your needs. Here are some key parameters:

  • total sets the maximum number of retries allowed.
  • backoff_factor defines the delay between retries using an exponential backoff strategy. The formula is: backoff_factor×(2(number of total retries1))\text{{backoff\_factor}} \times (2^{(\text{{number of total retries}} - 1)}). For example, a backoff factor of 1 results in delays of 1, 2, and 4 seconds for three retries.
  • status_forcelist is a list of HTTP status codes that will force a retry.
scraper.py
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

apikey = "YOUR_ZENROWS_API_KEY"
urls = [
	""  # ... your URLs here
]
zenrows_api_base = "https://api.zenrows.com/v1/"

requests_session = requests.Session()
retries = Retry(
	total=3,
	backoff_factor=1,
	status_forcelist=[429, 500, 502, 503, 504]
)
requests_session.mount("https://", HTTPAdapter(max_retries=retries))

for url in urls:
	try:
		response = requests_session.get(zenrows_api_base, params={
			"apikey": apikey,
			"url": url,
		})

		print(response.text)  # process response
	except Exception as e:
		print(e)  # will print "Max retries exceeded"

In this example, we use a list of URLs and process them sequentially. For improved performance, you can process them concurrently with a maximum. This approach is particularly useful when dealing with large numbers of URLs or when response times are slow.

JavaScript with axios-retry

To get started, ensure you have Node.js (or nvm) and npm installed on your system. Many systems come with these pre-installed. Once set up, install the necessary libraries by running:

npm install axios axios-retry

Instead of making direct calls to the ZenRows API using axios, we’ll use the axiosRetry library to manage retries. This configuration will automatically retry requests for all axios calls, and you can also create a specific axios client with retry capabilities.

You can adjust the axiosRetry parameters to fit your specific requirements. The configuration provided below should work well for most scenarios:

  • retries sets the number of allowed retries.
  • retryDelay this applies an exponential delay between attempts, with an additional 0-20% random delay margin to prevent repeated requests from hitting the server at regular intervals.
  • retryCondition this function determines whether an error is eligible for a retry. By default, the function retries only 5xx errors. However, in our example, we’ve added a check for 429 (Too Many Requests) errors to ensure these are also retried.
const axios = require("axios");
const axiosRetry = require("axios-retry");

const apikey = "YOUR_ZENROWS_API_KEY";
const zenrowsApiBase = "https://api.zenrows.com/v1/";
const urls = [
	// ... your URLs here
];

axiosRetry(axios, {
	retries: 3,
	retryDelay: axiosRetry.exponentialDelay,
	retryCondition: (error) => {
		if (error.response && error.response.status === 429) {
			return true; // example for custom retry condition
		}

		// fallback to default condition
		return axiosRetry.isNetworkOrIdempotentRequestError(error);
	},
});

(async () => {
	for (const url of urls) {
		try {
			const response = await axios.get(zenrowsApiBase, {
				params: { apikey, url },
			});
			console.log(response.data); // process response
		} catch (error) {
			console.log(error);
		}
	}
})();

The example processes a list of URLs sequentially for simplicity. However, you can also execute them concurrently, allowing multiple requests to be handled simultaneously.

This approach can be particularly useful when dealing with a large number of URLs, improving efficiency and reducing the overall time required. For more information on running requests concurrently, refer to our guide on using concurrency with ZenRows SDK for JavaScript.