User-Agent
header with the Universal Scraper API. Integrate them into Scrapy to appear as a different user every time so that your chances of getting blocked are reduced exponentially.
ZenRows provides two options for integrating proxies with Scrapy:
scrapy-zenrows
package, which provides the necessary middleware for integrating ZenRows with Scrapy.
settings.py
file. This setup configures your Scrapy spiders to use the ZenRows API for every request automatically.
settings.py
:
ZenRowsRequest
class, which provides a flexible way to configure ZenRows on a per-request basis.
ZenRowsRequest
function supports several other parameters, allowing you to customize each request to meet specific requirements. Here are some useful parameters:
proxy_country
: Specifies the country for the proxy, useful for geo-targeting.js_instructions
: Allows custom JavaScript actions on the page, such as waiting for elements to load.autoparse
: Automatically extracts data from supported websites.outputs
: Extracts specific content types like tables, images, or links.css_extractor
: Allows CSS-based content extraction.Referer
or Origin
) for successful scraping. ZenRows Middleware allows you to set custom headers on a per-request basis. When using custom headers, set the custom_headers
parameter to "true"
so that ZenRows includes your headers while managing essential browser headers on its end.
Here’s an example of setting a custom Referer header:
cookies
add them to the cookies dictionary in the request’s meta parameter. Just as with custom headers, custom_headers
must be set to "true"
for ZenRows to allow custom cookies. This is particularly useful for handling sessions or accessing region-specific content.
Why do I need a proxy for Scrapy?
Do you have any code examples?
scrapy_zenrows
middleware here!How do I know if my proxy is working?
httpbin.io/ip
. If the proxy is working, the response will display a different IP address than your local machine’s.What should I do if my requests are blocked?