Scrapy is a powerful web scraping library, but anti-scraping measures can make it challenging. A ZenRows Scrapy integration can overcome these obstacles.

In this tutorial, you’ll learn how to get your ZenRows proxy and integrate it with Scrapy using two methods: via Meta Parameter and Custom Middleware.

Use ZenRows’ Proxies with Scrapy to Avoid Blocks

ZenRows offers premium proxies in 190+ countries that auto-rotate the IP address for you, as well as the User-Agent header with the Scraper API. Integrate them into Scrapy to appear as a different user every time so that your chances of getting blocked are reduced exponentially.

You have two ways to get a proxy with ZenRows, one is via Residential Proxies, where you get our proxy, and it’s charged by the bandwidth; the other way is via the Scraper API’s Premium Proxy, which is our residential proxy for the API, and you are charged by the request, depending on the params you choose.

For this tutorial, we’ll focus on the Scraper API’s Premium Proxies, the recommended ZenRows proxy for Scrapy

After logging in, you’ll get redirected to the Request Builder page:

Paste your target URL (https://httpbin.io/ip), and check the Premium Proxies option. Then, on the right, select cURL and activate the Proxy connection mode to auto-generate your proxy URL.

Now, copy the proxy endpoint (the first URL between quotation marks). As you can verify in the screenshot above, here’s what it should look like:

http://YOUR_ZENROWS_API_KEY:premium_proxy=true@api.zenrows.com:8001

The target site of this tutorial section will be httpbin.io/ip, an endpoint that returns the origin IP of the incoming request. You’ll use it to verify that ZenRows is working.

Let’s assume you have set the Scrapy environment with the initial script below.

scraper.py
import scrapy

class ScraperSpider(scrapy.Spider):
    name = "scraper"
    allowed_domains = ["httpbin.io"]
    start_urls = ["https://httpbin.io/ip"]

    def parse(self, response):
        pass

Follow the steps below to integrate ZenRows proxies into this scraper!

Integrate your ZenRows proxy into Scrapy!

To configure the Proxy in Scrapy, you can set up a Scrapy proxy by adding a meta parameter, which is the easiest and recommended way, or by creating a custom middleware. Let’s explore the two approaches.

1. Add a Meta parameter

This method involves passing your proxy credentials as a meta parameter in the scrapy.Request() method.

Once you set your Scraper API proxy up, pass them into your Scrapy request using the following syntax.

scraper.py
import scrapy


class ScraperSpider(scrapy.Spider):
    name = "scraper"
    allowed_domains = ["httpbin.io"]
    start_urls = ["https://httpbin.io/ip"]
    for url in start_urls:
        yield scrapy.Request(
            url=url,
            callback=self.parse,
            meta={"proxy": "http://YOUR_ZENROWS_API_KEY:premium_proxy=true@api.zenrows.com:8001"},
        )

    def parse(self, response):
        pass

2. Create a Custom Middleware

The Scrapy middleware is an intermediary layer that intercepts requests. Once you specify a middleware, every request will be automatically routed through it.

Create your custom middleware by opening your settings.py file in your Scrapy project directory and add the proxy settings.

settings.py
DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 350,
}

HTTP_PROXY = 'http://YOUR_ZENROWS_API_KEY:premium_proxy=true@api.zenrows.com:8001'

Create the class CustomProxyMiddleware on your scraper.py file

scraper.py
class CustomProxyMiddleware(object):
    def __init__(self):
        self.proxy = 'http://YOUR_ZENROWS_API_KEY:premium_proxy=true@api.zenrows.com:8001'

    def process_request(self, request, spider):
        if 'proxy' not in request.meta:
            request.meta['proxy'] = self.proxy

    def get_proxy(self):
        return self.proxy

You can also add middleware at the spider level using custom settings, and your code would look like this:

scraper.py
import scrapy


class ScraperSpider(scrapy.Spider):
    name = "scraper"
    custom_settings = {
        'DOWNLOADER_MIDDLEWARES': {
            'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 350,
        },
    }

    def parse(self, response):
        pass

Pricing

ZenRows operates on a pay-per-success model on the Scraper API (that means you only pay for requests that produce the desired result); on the Residential Proxies, it’s based on bandwidth use.

To optimize your scraper’s success rate, fully replace Scrapy with ZenRows. Different pages on the same site may have various levels of protection, but using the parameters recommended above will get you covered.

ZenRows offers a range of plans, starting at just $69 monthly. For more detailed information, please refer to our pricing page.

Frequently Asked Questions (FAQs)