How to Integrate Scrapy with ZenRows
Scrapy is a powerful web scraping library, but anti-scraping measures can make it challenging. A ZenRows Scrapy integration can overcome these obstacles.
In this tutorial, you’ll learn how to get your ZenRows proxy and integrate it with Scrapy using two methods: via Meta Parameter and Custom Middleware.
Use ZenRows’ Proxies with Scrapy to Avoid Blocks
ZenRows offers premium proxies in 190+ countries that auto-rotate the IP address for you, as well as the User-Agent
header with the Scraper API. Integrate them into Scrapy to appear as a different user every time so that your chances of getting blocked are reduced exponentially.
ZenRows provides two options for integrating proxies with Scrapy:
-
Residential Proxies: With Residential Proxies, you can directly access our dedicated proxy network, billed by bandwidth usage. This option is ideal if you need flexible, on-demand proxy access.
-
Scraper API with ZenRows Middleware: Our Scraper API’s is optimized for high-demand scraping scenarios and billed per request based on chosen parameters. Using ZenRows Middleware for Scrapy allows you to seamlessly connect your Scrapy project to the Scraper API, automatically routing requests through the Premium Proxy and handling API-specific configurations.
Let’s assume you have set the Scrapy environment with the initial script below.
Follow the steps below to integrate ZenRows proxies into this scraper!
Integrate the ZenRows Middleware into Scrapy!
The ZenRows Middleware for Scrapy allows seamless integration of the ZenRows Scraper API into Scrapy projects. This middleware helps you manage proxy settings, enable advanced features like JavaScript rendering, and apply custom headers and cookies.
Installation
First, install the scrapy-zenrows
package, which provides the necessary middleware for integrating ZenRows with Scrapy.
Usage
To use the ZenRows Scraper API with Scrapy, sign in on ZenRows to obtain your API key. The API key allows you to access the Premium Proxy, JavaScript rendering, and other advanced scraping features.
Setting Up Global Middleware
To enable ZenRows as the default proxy across all Scrapy requests, add ZenRows Middleware to your project’s settings.py
file. This setup configures your Scrapy spiders to use the ZenRows API for every request automatically.
Enabling Premium Proxy and JavaScript Rendering
ZenRows offers Premium Proxy and JavaScript rendering features, which are essential for handling websites that require complex interactions or are protected by anti-bot systems. To enable these features for all requests, configure them in settings.py
:
Customizing ZenRows Middleware for Specific Requests
In scenarios where you don’t need Premium Proxy or JavaScript rendering for every request (e.g., for only certain pages or spiders), you can override global settings and apply these features only to specific requests. This is done using the ZenRowsRequest
class, which provides a flexible way to configure ZenRows on a per-request basis.
In this example, ZenRowsRequest is configured with js_render and premium_proxy set to true, ensuring that only this specific request uses JavaScript rendering and Premium Proxy.
Using Additional Request Parameters
The ZenRowsRequest
function supports several other parameters, allowing you to customize each request to meet specific requirements. Here are some useful parameters:
proxy_country
: Specifies the country for the proxy, useful for geo-targeting.js_instructions
: Allows custom JavaScript actions on the page, such as waiting for elements to load.autoparse
: Automatically extracts data from supported websites.outputs
: Extracts specific content types like tables, images, or links.css_extractor
: Allows CSS-based content extraction.
Here’s an example of using these advanced parameters:
Customizing Headers with ZenRows
Certain websites require specific headers (such as Referer
or Origin
) for successful scraping. ZenRows Middleware allows you to set custom headers on a per-request basis. When using custom headers, set the custom_headers
parameter to "true"
so that ZenRows includes your headers while managing essential browser headers on its end.
Here’s an example of setting a custom Referer header:
For cookies
add them to the cookies dictionary in the request’s meta parameter. Just as with custom headers, custom_headers
must be set to "true"
for ZenRows to allow custom cookies. This is particularly useful for handling sessions or accessing region-specific content.
Pricing
ZenRows operates on a pay-per-success model on the Scraper API (that means you only pay for requests that produce the desired result); on the Residential Proxies, it’s based on bandwidth use.
To optimize your scraper’s success rate, fully replace Scrapy with ZenRows. Different pages on the same site may have various levels of protection, but using the parameters recommended above will get you covered.
ZenRows offers a range of plans, starting at just $69 monthly. For more detailed information, please refer to our pricing page.
Frequently Asked Questions (FAQs)
Was this page helpful?