Frequently Asked Questions

Can I Get Cookies from the Responses?

Headers, including cookies, returned by the target website are prefixed with Zr- and included in all our responses.

Suppose you are scraping a website that requires session cookies for authentication. By capturing the Zr-Cookies header from the initial response, you can include these cookies in your subsequent requests to maintain the session and access authenticated content.

Zr-Content-Encoding: gzip
Zr-Content-Type: text/html
Zr-Cookies: _pxhd=Bq7P4CRaW1B...
Zr-Final-Url: https://www.example.com/

You could send those cookies in a subsequent request as Custom Headers and also use session_id to keep the same IP for up to 10 minutes.

By following this process, you can handle sessions and access restricted areas of the website seamlessly.

Can I Logging In/Register and Access Content Behind Login?

If you need to scrape data from a website that requires login authentication, you can log in or register and access content behind a login. However, due to privacy and legal reasons, we offer limited support for these cases.

Login and registration work like regular forms and can be treated as such. There are two main methods to send forms:

Send POST requests.
Fill in and submit a form using JavaScript Instructions.

{"fill": [".input-selector", "website_username"]} // Fill the username input
{"fill": [".input-selector", "website_password"]} // Fill the password input

All requests will return headers, including the session cookies. By using these cookies in subsequent requests, you can operate as a logged-in user. Additionally, you can include a Session ID to maintain the same IP address for up to 10 minutes.

ZenRows is a scraping tool, not a VPN. If your goal is to log in once and browse the internet with the same IP, you may need a different service.

Can I Maintain Sessions/IPs Between Requests

Can I Run the API/Proxy in Multiple Threads to Improve Speed?

Can I Send/Submit Forms?

CSS Selectors Do Not Work or 'Parser is Not Valid'

Common Issues with CSS Selectors

One of the most common issues users encounter when working with CSS Selectors in web scraping is improper encoding. CSS Selectors need to be correctly encoded to be recognized and processed by the API.

You can use ZenRows’ Builder or an online tool to properly encode your CSS Selectors before sending them in a request.

Example of Using a CSS Selector

Let’s say you want to extract content from the .my-class CSS selector and store it in a property named test. You would encode the selector and include it in your request like this:

curl "https://api.zenrows.com/v1/?apikey=YOUR_ZENROWS_API_KEY&url=YOUR_URL&css_extractor=%257B%2522test%2522%253A%2520%2522.my-class%2522%257D"

Troubleshooting CSS Selector Issues

If you’re still getting empty responses or the parser reports an error:

Check the Raw HTML: Request the plain HTML to see if the content served by the website differs from what you see in your browser. Some websites serve different content based on the user’s location, device, or other factors.
Verify the Selector: Ensure the selector you’re using is correct by testing it in your browser’s Developer Tools (e.g., using Chrome’s Console with document.querySelectorAll(".my-class")).
Review the Documentation: Refer to the ZenRows documentation for detailed information on using CSS Selectors with the API.

If the HTML looks correct, the selector works in the browser, but the parser still fails, contact us, and we’ll help you troubleshoot the issue.

From JSON using Python

If you’ve obtained JSON output from ZenRows with the autoparse feature enabled, you can use Python to convert this data into a CSV file.

Autoparsing can work for many websites but some are not included on this feature

The Pandas library will help us flatten nested JSON attributes and save the data as a CSV file.

Here’s a sample Python script:

scraper.py
# pip install requests pandas
import requests
import json
import pandas as pd

url = "https://www.zillow.com/san-francisco-ca/"
apikey = "YOUR_ZENROWS_API_KEY"
params = {"autoparse": True, "url": url, "apikey": apikey}
response = requests.get("https://api.zenrows.com/v1/", params=params)

content = json.loads(response.text)

data = pd.json_normalize(content)
data.to_csv("result.csv", index=False)

You can also adjust the json_normalize function to control how many nested levels to flatten and rename fields. For instance, to flatten only one inner level and remove latLong from latitude and longitude fields:

data = pd.json_normalize(content, max_level=1).rename(
	columns=lambda x: x.replace("latLong.", ""))

From HTML using Python

When dealing with HTML output without the autoparse feature, you can use BeautifulSoup to parse the HTML and extract data. We’ll use the example of an eCommerce site from Scraping Course. Create a dictionary for each product with essential details, then use Pandas to convert this list of dictionaries into a DataFrame and save it as a CSV file.

Here’s how to do it:

scraper.py
# pip install requests beautifulsoup4 pandas
import requests
from bs4 import BeautifulSoup
import pandas as pd

url = "https://www.scrapingcourse.com/ecommerce/"
apikey = "YOUR_ZENROWS_API_KEY"
params = {"url": url, "apikey": apikey}
response = requests.get("https://api.zenrows.com/v1/", params=params)
soup = BeautifulSoup(response.content, "html.parser")

content = [{
	"product_name": product.select_one(".product-name").text.strip(),
	"price": product.select_one(".price").text.strip(),
	"rating": product.select_one(".rating").text.strip() if product.select_one(".rating") else "N/A",
	"link": product.select_one(".product-link").get("href"),
} for product in soup.select(".product")]

data = pd.DataFrame(content)
data.to_csv("result.csv", index=False)

From JSON using JavaScript

For JavaScript and Node.js, you can use the json2csv library to handle the JSON to CSV conversion.

After getting the data, we will parse it with a flatten transformer. As the name implies, it will flatten the nested structures inside the JSON. Then, save the file using writeFileSync.

Here’s an example using the ZenRows Universal Scraper API with Node.js:

scraper.js
// npm install zenrows json2csv
const fs = require("fs");
const {
	Parser,
	transforms: { flatten },
} = require("json2csv");
const { ZenRows } = require("zenrows");

(async () => {
	const client = new ZenRows("YOUR_ZENROWS_API_KEY");
	const url = "https://www.scrapingcourse.com/ecommerce/";

	const { data } = await client.get(url, { autoparse: "true" });

	const parser = new Parser({ transforms: [flatten()] });
	const csv = parser.parse(data);

	fs.writeFileSync("result.csv", csv);
})();

From HTML using JavaScript

For extracting data from HTML without autoparse you can use the cheerio library to parse the HTML and extract relevant information. We’ll use the Scraping Course eCommerce example for this task:

As with the Python example, we will use AutoScout24 to extract data from HTML without the autoparse feature. For that, we will get the plain result and load it into cheerio. It will allow us to query elements as we would in the browser or with jQuery. We will return an object with essential data for each car entry in the list. Parse that list into CSV using json2csv, and no flatten is needed this time. And lastly, store the result. These last two steps are similar to the autoparse case.

scraper.js
// npm install zenrows json2csv cheerio
const fs = require("fs");
const cheerio = require("cheerio");
const { Parser } = require("json2csv");
const { ZenRows } = require("zenrows");

(async () => {
	const client = new ZenRows("YOUR_ZENROWS_API_KEY");
	const url = "https://www.scrapingcourse.com/ecommerce/";

	const { data } = await client.get(url);
	const $ = cheerio.load(data);

	const content = $(".product").map((_, product) => ({
		product_name: $(product).find(".product-name").text().trim(),
		price: $(product).find(".price").text().trim(),
		rating: $(product).find(".rating").text().trim() || "N/A",
		link: $(product).find(".product-link").attr("href"),
	}))
	.toArray();

	const parser = new Parser();
	const csv = parser.parse(content);

	fs.writeFileSync("result.csv", csv);
})();

If you encounter any issues or need further assistance with your scraper setup, please contact us, and we’ll be happy to help!

Extract Data from Lists, Tables, and Grids

We’ll explore popular use cases for scraping, such as lists, tables, and product grids. Use these as inspiration and a guide for your scrapers.

Scraping from Lists

We will use the Wikipedia page on Web scraping for testing. A section at the bottom, “See also”, contains links in a list. We can get the content by using the CSS selector for the list items: {"items": ".div-col > ul li"}.

That will get the text, but what of the links? To access attributes, we need a non-standard syntax for the selector: @href. It won’t work with the previous selector since the last item is the li element, which does not have an href attribute. So we must change it for the link element: {"links": ".div-col > ul a @href"}.

CSS selectors, in some languages, must be encoded to avoid problems with URLs.

curl "https://api.zenrows.com/v1/?apikey=YOUR_ZENROWS_API_KEY&url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FWeb_scraping&css_extractor=%257B%2522items%2522%253A%2520%2522.div-col%2520%253E%2520ul%2520li%2522%252C%2520%2522links%2522%253A%2520%2522.div-col%2520%253E%2520ul%2520a%2520%2540href%2522%257D"

Our Builder can help you write and test the selectors and output code in several languages.

Scraping from Tables

Assuming regular tables (no empty cells, rows with fewer items, and others), we can extract table data with CSS selectors. We’ll use a list of countries, the first table on the page, the one with the class wikitable.

To extract the rank, which is the first column, we can use "table.wikitable tr > :first-child". It will return an array with 243 items, 2 header lines, and 241 ranks. For the country name, second column, something similar but adding an a to avoid capturing the flags: "table.wikitable tr > :nth-child(2) a". In this case, the array will have one less item since the second heading has no link. That might be a problem if we want to match items by array index.

curl "https://api.zenrows.com/v1/?apikey=YOUR_ZENROWS_API_KEY&url=https%3A%2F%2Fen.m.wikipedia.org%2Fwiki%2FList_of_countries_and_dependencies_by_population&css_extractor=%257B%2522rank%2522%253A%2520%2522table.wikitable%2520tr%2520%253E%2520%253Afirst-child%2522%252C%2520%2522countries%2522%253A%2520%2522table.wikitable%2520tr%2520%253E%2520%253Anth-child%282%29%2520a%2522%257D"

Outputs:

{
	"countries": ["Country or dependent territory", "China", "India", ...],
	"rank": ["Rank", "-", "1", "2", ...]
}

As stated above, this might prove difficult for non-regular tables. For those, we might prefer to get the Plain HTML and scrape the content with a tool or library so we can add conditionals and logic.

This example lists items by column, not row, which might prove helpful in various cases. However, there are no easy ways to extract structured data from tables using CSS Selectors and group it by row.

Scraping from Product Grids

As with the tables, non-regular grids might cause problems. We’ll scrape the price, product name, and link from an online store. By manually searching the page’s content, we arrive at cards with the class .product. Those contain all the data we want.

It is essential to avoid duplicates, so we have to use some precise selectors. For example, ".product-item .product-link @href" for the links. We added the .product-link class because it is unique to the product cards. The same goes for name and price, which also have unique classes. All in all, the final selector would be:

{
	"links": ".product-item .product-link @href",
	"names": ".product-item .product-name",
	"prices": ".product-item .product-price"
}

Several items are on the page at the time of this writing. And each array has the same number of elements, so everything looks fine. If we were to group them, we could zip the arrays.

For example, in python, taking advantage of the auto-encoding that requests.get does to parameters. Remember to encode the URL and CSS extractor for different scenarios when that is not available.

scraper.py
# pip install requests
import requests
import json

zenrows_api_base = "https://api.zenrows.com/v1/?apikey=YOUR_ZENROWS_API_KEY"
url = "https://www.scrapingcourse.com/ecommerce/"

css_extractor = """{
	"links": ".product .product-link @href",
	"names": ".product .product-name",
	"prices": ".product .product-price"
}"""

response = requests.get(zenrows_api_base, params={
						"url": url, "css_extractor": css_extractor})
parsed_json = json.loads(response.text)
result = zip(parsed_json["links"], parsed_json["names"], parsed_json["prices"])
print(list(result))

# [('/products/product1', 'Product 1', '$10.00'), ... ]

Remember that this approach won’t work properly if, for example, some products have no price. Not all the arrays would have the same length, and the zipping would misassign data. Getting the Plain HTML and parsing the content with a library and custom logic is a better solution for those cases.

If you encounter any problems or cannot correctly set up your scraper, contact us, and we’ll help you.

How Can I Set Specific Headers?

How Do I Send POST Requests with JSON Data?

How do I Send POST Requests?

How to encode URLs?

When working with the ZenRows Universal Scraper API, it’s crucial to encode your target URLs, especially if they contain query parameters. Encoding ensures that your URLs are correctly interpreted by the API, avoiding potential conflicts between the target URL’s parameters and those used in the API request.

Consider the following URL example:

https://www.scrapingcourse.com/ecommerce/?course=web-scraping&section=advanced

If you were to send this URL directly as part of your API request without encoding, and you also include the premium_proxy parameter, the request might look something like this:

curl "https://api.zenrows.com/v1/?apikey=YOUR_ZENROWS_API_KEY&url=https://www.scrapingcourse.com/ecommerce/?course=web-scraping&section=advanced&premium_proxy=true"

In this scenario, the API might incorrectly interpret the course and section parameters as part of the API’s query string rather than the target URL. This could lead to errors or unintended behavior.

To avoid such issues, you should encode your target URL before including it in the API request. URL encoding replaces special characters (like &, ?, and =) with a format that can be safely transmitted over the internet.

Here’s how you can encode the URL in Python:

encoder.py
import urllib.parse
encoded_url = urllib.parse.quote("https://www.scrapingcourse.com/ecommerce/?course=web-scraping&section=advanced")

After encoding, your Universal Scraper API request would look like this:

curl "https://api.zenrows.com/v1/?apikey=YOUR_ZENROWS_API_KEY&url=https%3A%2F%2Fwww.scrapingcourse.com%2Fecommerce%2F%3Fcourse%3Dweb-scraping%26section%3Dadvanced&premium_proxy=true"

Many HTTP clients, such as axios (JavaScript) and requests (Python), automatically encode URLs for you. However, if you are manually constructing requests or using a client that doesn’t handle encoding, you can use programming language functions or online tools to encode your URLs.

For quick manual encoding, you can use an online tool, but remember that this method is not scalable for automated processes.

Using Premium Proxies + JS Render and still blocked

What are Residential IPs?

Understanding Proxy Types: Data Center vs. Residential IPs

When it comes to web scraping proxies, there are two main types of IPs you can use: data center and residential.

Data Center IPs: These are IP addresses provided by cloud service providers or hosting companies. They are typically fast and reliable, but because they are easily recognizable as belonging to data centers, they are more likely to be blocked by websites that have anti-scraping measures in place.
Residential IPs: These IP addresses are assigned by Internet Service Providers (ISPs) to real residential users. Since they appear as regular users browsing the web, they are much harder to detect and block. This makes residential IPs particularly valuable when scraping sites with strong anti-bot protections, like Google or other heavily guarded domains.

How ZenRows Uses Residential IPs

By default, ZenRows uses data center connections for your requests. However, if you’re facing blocks or need to scrape highly protected websites, you can opt for residential IPs by setting the premium_proxy parameter to true. This will route your request through a residential IP, significantly increasing your chances of success.

It’s important to note that using residential IPs comes with an additional cost due to the higher value and lower detection rate of these proxies.

YOu can check out more about Premium Proxies here!

Example of a Request with Residential IPs

Here’s how you can make a request using a residential IP:

curl "https://api.zenrows.com/v1/?apikey=YOUR_ZENROWS_API_KEY&url=YOUR_URL&premium_proxy=true"

In cases where you’re also targeting content localized to specific regions, ZenRows supports geotargeting with residential IPs, allowing you to specify the country of the IP.

Troubleshooting Blocks

If you continue to experience blocks even with residential IPs, feel free to contact us, and we’ll work with you to find a solution.

What is Autoparse?

Simplifying Data Extraction with Autoparse

ZenRows offers a powerful feature called Autoparse, designed to simplify the process of extracting structured data from websites. This feature leverages custom parsers allowing you to easily retrieve data in a structured JSON format rather than raw HTML.

How It Works

By default, when you call the ZenRows API, the response will be in Plain HTML. However, when you activate the autoparse parameter, the API will automatically parse the content of supported websites and return the data as a JSON object. This makes it much easier to work with the data, especially when dealing with complex websites that require extensive parsing logic.

Example of a Request with Autoparse

Here’s how you can make an API call with the Autoparse feature enabled:

curl "https://api.zenrows.com/v1/?apikey=YOUR_ZENROWS_API_KEY&url=YOUR_URL&autoparse=true"

Limitations and Troubleshooting

Supported Domains: The Autoparse feature is in experimental phase and doesn’t work in all domains. You can view some of the supported domains on the ZenRows Scraper page. If the website you’re trying to scrape isn’t supported, the response will either be empty, incomplete, or an error.
Fallback to HTML: If you find that Autoparse doesn’t return the desired results, you can simply remove the autoparse parameter and try the request again. This will return the plain HTML response, allowing you to manually parse the data as needed.

What Are the Benefits of JavaScript Rendering?

Why Some Headers are Managed by ZenRows?

Browser-based headers are crucial for ensuring that requests appear legitimate to target websites. ZenRows manages these headers to mimic real user behavior, which significantly reduces the risk of being blocked. By preventing customers from manually setting these headers, ZenRows can optimize the success rate and avoid common pitfalls associated with improper header configurations.

Example of Sending Custom Headers

Here’s an example using cURL to send custom headers that are permitted along with your ZenRows request:

bash
curl \
-H "Accept: application/json" \
-H "Referer: https://www.google.com" \
"https://api.zenrows.com/v1/?apikey=YOUR_ZENROWS_API_KEY&url=YOUR_URL&custom_headers=true"

Quickstart

Get Started

Features

Troubleshooting

Help

Frequently Asked Questions

Common Issues with CSS Selectors

Example of Using a CSS Selector

Troubleshooting CSS Selector Issues

See Also

From JSON using Python

From HTML using Python

From JSON using JavaScript

From HTML using JavaScript

Scraping from Lists

Scraping from Tables

Scraping from Product Grids

Understanding Proxy Types: Data Center vs. Residential IPs

How ZenRows Uses Residential IPs

Example of a Request with Residential IPs

Troubleshooting Blocks

Simplifying Data Extraction with Autoparse

How It Works

Example of a Request with Autoparse

Limitations and Troubleshooting

Example of Sending Custom Headers