Retry Failed Requests
Implementing auto-retry policies in web scraping is crucial for maintaining data accuracy and reliability, especially since achieving a 100% success rate is difficult. Factors such as site downtimes or interrupted connections can cause occasional failures.
However, with ZenRows®, you are not charged for these failed attempts. For instance, if you make 130 requests and 15 need to be retried, you will only be charged for the 115 successful ones, providing a cost-effective solution.
Below, we offer detailed solutions in Python and JavaScript to help you implement these retry policies. If you require support for other languages or frameworks, please feel free to contact us.
retries
parameter in the constructor to enable this feature and enhance your scraping efficiency.Python with Requests
To begin, ensure you have Python 3 installed on your system. Some systems come with it pre-installed. Once Python is ready, install the necessary libraries by running the following command:
We will use the Retry
class from urllib3
and the HTTPAdapter
from requests. These are part of the requests
library, so you don’t need to install them separately.
Instead of making direct get
calls, we create a requests session. This approach allows us to set up and reuse the session configuration, including retry settings, for all subsequent requests. This setup is efficient and ensures consistent retry behavior across all requests.
You can adjust the Retry
parameters to suit your needs. Here are some key parameters:
total
sets the maximum number of retries allowed.backoff_factor
defines the delay between retries using an exponential backoff strategy. The formula is: . For example, a backoff factor of 1 results in delays of 1, 2, and 4 seconds for three retries.status_forcelist
is a list of HTTP status codes that will force a retry.
In this example, we use a list of URLs and process them sequentially. For improved performance, you can process them concurrently with a maximum. This approach is particularly useful when dealing with large numbers of URLs or when response times are slow.
JavaScript with axios-retry
To get started, ensure you have Node.js (or nvm) and npm installed on your system. Many systems come with these pre-installed. Once set up, install the necessary libraries by running:
Instead of making direct calls to the ZenRows API using axios
, we’ll use the axiosRetry library to manage retries. This configuration will automatically retry requests for all axios
calls, and you can also create a specific axios client
with retry capabilities.
You can adjust the axiosRetry
parameters to fit your specific requirements. The configuration provided below should work well for most scenarios:
retries
sets the number of allowed retries.retryDelay
this applies an exponential delay between attempts, with an additional 0-20% random delay margin to prevent repeated requests from hitting the server at regular intervals.retryCondition
this function determines whether an error is eligible for a retry. By default, the function retries only 5xx errors. However, in our example, we’ve added a check for 429 (Too Many Requests) errors to ensure these are also retried.
The example processes a list of URLs sequentially for simplicity. However, you can also execute them concurrently, allowing multiple requests to be handled simultaneously.
This approach can be particularly useful when dealing with a large number of URLs, improving efficiency and reducing the overall time required. For more information on running requests concurrently, refer to our guide on using concurrency with ZenRows SDK for JavaScript.
Was this page helpful?