Use ZenRows’ Proxies with Scrapy to Avoid Blocks
ZenRows offers premium proxies in 190+ countries that auto-rotate the IP address for you, as well as theUser-Agent
header with the Universal Scraper API. Integrate them into Scrapy to appear as a different user every time so that your chances of getting blocked are reduced exponentially.
ZenRows provides two options for integrating proxies with Scrapy:
- Residential Proxies: With Residential Proxies, you can directly access our dedicated proxy network, billed by bandwidth usage. This option is ideal if you need flexible, on-demand proxy access.
- Universal Scraper API with ZenRows Middleware: Our Universal Scraper API’s is optimized for high-demand scraping scenarios and billed per request based on chosen parameters. Using ZenRows Middleware for Scrapy allows you to seamlessly connect your Scrapy project to the Universal Scraper API, automatically routing requests through the Premium Proxy and handling API-specific configurations.
In this tutorial, we’ll focus on using the Universal Scraper API’s with the ZenRows Middleware, the recommended setup for seamless Scrapy integration.
scraper.py
Integrate the ZenRows Middleware into Scrapy!
The ZenRows Middleware for Scrapy allows seamless integration of the ZenRows Universal Scraper API into Scrapy projects. This middleware helps you manage proxy settings, enable advanced features like JavaScript rendering, and apply custom headers and cookies.Installation
First, install thescrapy-zenrows
package, which provides the necessary middleware for integrating ZenRows with Scrapy.
Usage
To use the ZenRows Universal Scraper API with Scrapy, sign in on ZenRows to obtain your API key. The API key allows you to access the Premium Proxy, JavaScript rendering, and other advanced scraping features.Setting Up Global Middleware
To enable ZenRows as the default proxy across all Scrapy requests, add ZenRows Middleware to your project’ssettings.py
file. This setup configures your Scrapy spiders to use the ZenRows API for every request automatically.
settings.py
Enabling Premium Proxy and JavaScript Rendering
ZenRows offers Premium Proxy and JavaScript rendering features, which are essential for handling websites that require complex interactions or are protected by anti-bot systems. To enable these features for all requests, configure them insettings.py
:
settings.py
By default, both features are disabled to keep requests lean and cost-effective.
Customizing ZenRows Middleware for Specific Requests
In scenarios where you don’t need Premium Proxy or JavaScript rendering for every request (e.g., for only certain pages or spiders), you can override global settings and apply these features only to specific requests. This is done using theZenRowsRequest
class, which provides a flexible way to configure ZenRows on a per-request basis.
scraper.py
Using Additional Request Parameters
TheZenRowsRequest
function supports several other parameters, allowing you to customize each request to meet specific requirements. Here are some useful parameters:
proxy_country
: Specifies the country for the proxy, useful for geo-targeting.js_instructions
: Allows custom JavaScript actions on the page, such as waiting for elements to load.autoparse
: Automatically extracts data from supported websites.outputs
: Extracts specific content types like tables, images, or links.css_extractor
: Allows CSS-based content extraction.
scraper.py
Refer to the ZenRows Universal Scraper API documentation for a complete list of supported parameters.
Customizing Headers with ZenRows
Certain websites require specific headers (such asReferer
or Origin
) for successful scraping. ZenRows Middleware allows you to set custom headers on a per-request basis. When using custom headers, set the custom_headers
parameter to "true"
so that ZenRows includes your headers while managing essential browser headers on its end.
Here’s an example of setting a custom Referer header:
scraper.py
cookies
add them to the cookies dictionary in the request’s meta parameter. Just as with custom headers, custom_headers
must be set to "true"
for ZenRows to allow custom cookies. This is particularly useful for handling sessions or accessing region-specific content.
scraper.py
Cookies are often required to maintain user sessions or comply with location-based content restrictions. For more information on cookies and headers, see ZenRows headers feature documentation.
Pricing
ZenRows operates on a pay-per-success model on the Universal Scraper API (that means you only pay for requests that produce the desired result); on the Residential Proxies, it’s based on bandwidth use. To optimize your scraper’s success rate, fully replace Scrapy with ZenRows. Different pages on the same site may have various levels of protection, but using the parameters recommended above will get you covered. ZenRows offers a range of plans, starting at just $69 monthly. For more detailed information, please refer to our pricing page.Troubleshooting Guide
Even with ZenRows handling most scraping challenges, you might encounter issues. Here’s how to diagnose and resolve common problems:Anti-Bot Detection Issues
Problem: Content doesn’t match what you see in browser
Solutions:- Enable JavaScript rendering: Some sites load content dynamically
- Check if Premium Proxies are needed: Some sites may block datacenter IPs
- Use custom headers to appear more like a real browser: add a valid referer like Google or Bing
Problem: Getting redirected to CAPTCHA or security pages
Solution:- Use full browser emulation with JS rendering and Premium Proxies:
- Try different geographic locations:
Frequently Asked Questions (FAQs)
Why do I need a proxy for Scrapy?
Why do I need a proxy for Scrapy?
Scrapy is widely recognized by websites’ anti-bot systems, which can block your requests. Using residential proxies from ZenRows allows you to rotate IP addresses and appear as a legitimate user, helping to bypass these restrictions and reduce the chances of being blocked.
Do you have any code examples?
Do you have any code examples?
Yes! You can find code examples demonstrating how to use the
scrapy_zenrows
middleware here!How do I know if my proxy is working?
How do I know if my proxy is working?
You can test the proxy connection by running the script provided in the tutorial and checking the output from
httpbin.io/ip
. If the proxy is working, the response will display a different IP address than your local machine’s.What should I do if my requests are blocked?
What should I do if my requests are blocked?
Many websites employ advanced anti-bot measures, such as CAPTCHAs and Web Application Firewalls (WAFs), to prevent automated scraping. Simply using proxies may not be enough to bypass these protections.Instead of relying solely on proxies, consider using ZenRows’ Universal Scraper API, which provides:
- JavaScript Rendering and Interaction Simulation: Optimized with anti-bot bypass capabilities.
- Comprehensive Anti-Bot Toolkit: ZenRows offers advanced tools to overcome complex anti-scraping solutions.