Introduction to the Scraping API
The ZenRows® Scraping API is a versatile tool designed to simplify and enhance the process of extracting data from websites. Whether you’re dealing with static or dynamic content, our API provides a range of features to meet your scraping needs efficiently.
Key Features
-
JavaScript Render: Render JavaScript on web pages using a headless browser to scrape dynamic content that traditional methods might miss.
- Wait times and selectors: Specify wait times and CSS selectors to ensure elements are fully loaded before scraping.
- Resource blocking: Block unnecessary resources from loading to speed up scraping.
- JSON response formatting: Receive responses in JSON format, including XHR or Fetch requests.
- CSS-based data extraction: Extract data using CSS selectors with our CSS extractor feature.
- Auto-parsing: Automatically parse and extract relevant data from the scraped HTML.
- Screenshot capabilities: Capture screenshots of the target page, including full-page and specific element screenshots.
- Various outputs: Lets you specify which data types to extract from the scraped HTML.
- JavaScript Instructions: Allowing you to interact with web pages dynamically.
-
Premium Proxies: Leverage a vast network of over 55 million residential IPs across 185+ countries, ensuring a 99.9% uptime for uninterrupted scraping.
- Geolocation selection: Choose the geographical location of the IP address used for the request (for Premium Proxies).
-
Custom Headers: Add custom HTTP headers to your requests for more control.
-
Session management: Use a session ID to maintain the same IP address across multiple requests for up to 10 minutes.
-
Cloudflare bypass: The API is designed to overcome Cloudflare protection, making it easier to scrape protected websites.
-
Language agnostic: While Python examples are provided, the API works with any programming language that can make HTTP requests.
The Scraping API is particularly useful for extracting data from dynamic websites and handling complex scenarios like JavaScript content and web element interactions, without the need to manage proxies or handle dynamic user interactions manually.
Parameter Overview
Customize your scraping requests using the following parameters:
PARAMETER | TYPE | DEFAULT | DESCRIPTION |
---|---|---|---|
apikey required | string | Get Your Free API Key | Your unique API key for authentication |
url required | string | The URL of the page you want to scrape | |
js_render | boolean | false | Enable JavaScript rendering with a headless browser |
js_instructions | string | Allows you to interact with the web page dynamically | |
custom_headers | boolean | false | Include custom headers in your request |
premium_proxy | boolean | false | Use premium proxies to make the request harder to detect |
proxy_country | string | Geolocation of the IP used to make the request. Only for Premium Proxies | |
session_id | integer | Assign a session ID to maintain the same IP for multiple requests for up to 10 minutes | |
original_status | boolean | false | Return the original HTTP status code from the target page |
allowed_status_codes | string | Returns the content of the target page even if it fails with a status code on the provided list | |
wait_for | string | Wait for a given CSS Selector to load in the DOM before returning the content | |
wait | integer | 0 | Wait a fixed amount of time before returning the content |
block_resources | string | Block specific resources from loading | |
json_response | string | false | Obtain the response in JSON format, including data from XHR or Fetch requests |
css_extractor | string (JSON) | Define CSS Selectors to extract data from the HTML. | |
autoparse | boolean | false | Use our auto parser algorithm to automatically extract data. |
response_type | string | Get the content parsed as Markdown, Plaintext or PDF instead of HTML | |
screenshot | boolean | false | Returns an above-the-fold screenshot of the target page. |
screenshot_fullpage | boolean | false | Returns a full-page screenshot of the target page. |
screenshot_selector | string | Returns a screenshot of a specific CSS Selector of the target page. | |
screenshot_format | string | Choose between png and jpeg formats, with PNG being the default | |
screenshot_quality | integer | Applicable when using JPEG, this parameter allows you to set the quality from 1 to 100 | |
outputs | string | Lets you specify which data types to extract from the scraped HTML |
Getting started
ZenRows API simplifies web scraping, requiring just two essential components:
- API key – Your unique key to authenticate requests.
- The encoded URL – The target URL you want to scrape, properly encoded.
Connection modes
ZenRows supports three connection methods:
- API Mode: Ideal for quick and easy scraping with all the processing handled by ZenRows. Just pass your API key and URL, and get the data back.
- Proxy Mode: If you prefer more control, connect ZenRows as a proxy to your existing scraping infrastructure. This mode allows you to leverage ZenRows’ proxy network while maintaining flexibility in your scraping logic.
- SDK: For added convenience, ZenRows provides SDKs for Python and Node.js, making it easier for newcomers to get started.
Response Headers
ZenRows simplifies scraping responses by adding custom headers to the returned data. Headers, including cookies, from the target website are prefixed with Zr-
for easy identification.
Additionally, ZenRows will append a Zr-Final-Url
header showing the final visited URL after any potential redirects, ensuring you always know the last page accessed.
Here’s an example of the headers returned by ZenRows:
Zr-Content-Encoding: gzip
Zr-Content-Type: text/html
Zr-Cookies: _pxhd=Bq7P4CRaW1B...
Zr-Final-Url: https://www.example.com/
Zr-Final-Url
represents the final URL after all redirects, some anti-bot services, such as Cloudflare, may redirect you after a successful request and add parameters to the URL.API Key
To access the full functionality of ZenRows’ Scraping API, you’ll need a valid API Key. This unique key acts as your personal identifier, ensuring all requests are securely authorized and linked to your account.
Once you have your API Key, you’re ready to start scraping data from any website. If you haven’t created your key yet, you can easily create your API Key and begin exploring all the capabilities ZenRows has to offer.
Your API Key is essential for every request you make, so keep it secure and avoid sharing it with others. If needed, you can regenerate your key or create multiple keys for different projects.
URL
The URL is the web page you wish to scrape. To ensure proper functionality, the URL needs to be URL-encoded, especially if it contains special characters like spaces, question marks, or ampersands. Encoding transforms these characters into a format that browsers and servers can correctly interpret.
Many HTTP clients and ZenRows’ SDKs handle this encoding automatically, but it’s important to be aware of when working with custom implementations.
requests
in Python or fetch
in Node.js, they often handle this for you.