How to Integrate Scrapy with ZenRows
Scrapy is a powerful web scraping library, but anti-scraping measures can make it challenging. A ZenRows Scrapy integration can overcome these obstacles.
In this tutorial, you’ll learn how to get your ZenRows proxy and integrate it with Scrapy using two methods: via Meta Parameter and Custom Middleware.
Use ZenRows’ Proxies with Scrapy to Avoid Blocks
ZenRows offers premium proxies in 190+ countries that auto-rotate the IP address for you, as well as the User-Agent
header with the Scraper API. Integrate them into Scrapy to appear as a different user every time so that your chances of getting blocked are reduced exponentially.
You have two ways to get a proxy with ZenRows, one is via Residential Proxies, where you get our proxy, and it’s charged by the bandwidth; the other way is via the Scraper API’s Premium Proxy, which is our residential proxy for the API, and you are charged by the request, depending on the params you choose.
After logging in, you’ll get redirected to the Request Builder page:
Paste your target URL (https://httpbin.io/ip
), and check the Premium Proxies
option. Then, on the right, select cURL and activate the Proxy connection mode to auto-generate your proxy URL.
Now, copy the proxy endpoint (the first URL between quotation marks). As you can verify in the screenshot above, here’s what it should look like:
http://YOUR_ZENROWS_API_KEY:premium_proxy=true@api.zenrows.com:8001
The target site of this tutorial section will be httpbin.io/ip, an endpoint that returns the origin IP of the incoming request. You’ll use it to verify that ZenRows is working.
Let’s assume you have set the Scrapy environment with the initial script below.
import scrapy
class ScraperSpider(scrapy.Spider):
name = "scraper"
allowed_domains = ["httpbin.io"]
start_urls = ["https://httpbin.io/ip"]
def parse(self, response):
pass
Follow the steps below to integrate ZenRows proxies into this scraper!
Integrate your ZenRows proxy into Scrapy!
To configure the Proxy in Scrapy, you can set up a Scrapy proxy by adding a meta parameter, which is the easiest and recommended way, or by creating a custom middleware. Let’s explore the two approaches.
1. Add a Meta parameter
This method involves passing your proxy credentials as a meta parameter in the scrapy.Request()
method.
Once you set your Scraper API proxy up, pass them into your Scrapy request using the following syntax.
import scrapy
class ScraperSpider(scrapy.Spider):
name = "scraper"
allowed_domains = ["httpbin.io"]
start_urls = ["https://httpbin.io/ip"]
for url in start_urls:
yield scrapy.Request(
url=url,
callback=self.parse,
meta={"proxy": "http://YOUR_ZENROWS_API_KEY:premium_proxy=true@api.zenrows.com:8001"},
)
def parse(self, response):
pass
2. Create a Custom Middleware
The Scrapy middleware is an intermediary layer that intercepts requests. Once you specify a middleware, every request will be automatically routed through it.
Create your custom middleware by opening your settings.py
file in your Scrapy project directory and add the proxy settings.
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 350,
}
HTTP_PROXY = 'http://YOUR_ZENROWS_API_KEY:premium_proxy=true@api.zenrows.com:8001'
Create the class CustomProxyMiddleware
on your scraper.py
file
class CustomProxyMiddleware(object):
def __init__(self):
self.proxy = 'http://YOUR_ZENROWS_API_KEY:premium_proxy=true@api.zenrows.com:8001'
def process_request(self, request, spider):
if 'proxy' not in request.meta:
request.meta['proxy'] = self.proxy
def get_proxy(self):
return self.proxy
You can also add middleware at the spider level using custom settings, and your code would look like this:
import scrapy
class ScraperSpider(scrapy.Spider):
name = "scraper"
custom_settings = {
'DOWNLOADER_MIDDLEWARES': {
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 350,
},
}
def parse(self, response):
pass
Pricing
ZenRows operates on a pay-per-success model on the Scraper API (that means you only pay for requests that produce the desired result); on the Residential Proxies, it’s based on bandwidth use.
To optimize your scraper’s success rate, fully replace Scrapy with ZenRows. Different pages on the same site may have various levels of protection, but using the parameters recommended above will get you covered.
ZenRows offers a range of plans, starting at just $69 monthly. For more detailed information, please refer to our pricing page.