ZenRows® provides a seamless way to extract data directly in the API call using CSS Selectors.

This feature, activated by the css_extractor parameter, returns a JSON object instead of Plain HTML. This can significantly simplify your data extraction process, especially if you prefer not to handle HTML parsing manually. For those who already have their solutions or prefer custom parsing, ZenRows still allows you to retrieve Plain HTML and process it with libraries like BeautifulSoup for Python or Cheerio for JavaScript.

Using CSS Selectors

Suppose you want to scrape the title of the ScrapingCourse eCommerce page. The title is contained in an h1 tag. To extract it, send the css_extractor parameter with the value {"title": "h1"}. Ensure the parameter is properly encoded.

curl "https://api.zenrows.com/v1/?apikey=YOUR_ZENROWS_API_KEY&url=https%3A%2F%2Fwww.scrapingcourse.com%2Fecommerce%2F&css_extractor=%257B%2522title%2522%253A%2520%2522h1%2522%257D"

To take it a step further, let’s also extract the product names using the selector .product-name.

curl "https://api.zenrows.com/v1/?apikey=YOUR_ZENROWS_API_KEY&url=https%3A%2F%2Fwww.scrapingcourse.com%2Fecommerce%2F&css_extractor=%257B%2522title%2522%253A%2522h1%2522%252C%2522product-list%2522%253A%2522.product-name%2522%257D"

This will return:

{
	"title": "E-commerce Products",
	"products": [
		"Product 1",
		"Product 2",
		"Product 3"
		// ...
	]
}

You might need a list of product links to continue scraping new product details. Filter these links to only include those starting with /product/. To get the attribute instead of the text content, add: "links": "a[href*='/product/'] @href".

curl "https://api.zenrows.com/v1/?apikey=YOUR_ZENROWS_API_KEY&url=https%3A%2F%2Fwww.scrapingcourse.com%2Fecommerce%2F&css_extractor=%257B%2522title%2522%253A%2522h1%2522%252C%2522products%2522%253A%2522.product-name%2522%252C%2522links%2522%253A%2522a%255Bhref*%253D%27%252Fproduct%252F%27%255D%2540href%2522%257D"

This will return:

{
	"title": "Shop",
	"products": [
		"Product 1",
		"Product 2",
		"Product 3"
		// ...
	],
	"links": [
		"/product/1",
		"/product/2",
		"/product/3"
		// ...
	]
}

Before going full scale, test your selectors and get the encoded results in our Builder. The Builder also outputs code in several languages, making it easier to integrate into your applications.

For more details, check CSS Selectors documentation.

Using External Libraries

If you prefer to use your favorite HTML parsing library, you can still retrieve Plain HTML from ZenRows and process it with tools like BeautifulSoup and Cheerio.

Python with BeautifulSoup

scraper.py
# pip install requests beautifulsoup4
import requests
from bs4 import BeautifulSoup

zenrows_api_base = "https://api.zenrows.com/v1/?apikey=YOUR_ZENROWS_API_KEY"
url = "https://www.scrapingcourse.com/ecommerce/"

response = requests.get(zenrows_api_base, params={'url': url})
soup = BeautifulSoup(response.text, "html.parser")

title = soup.find("h1").text
products = [product.text for product in soup.select(".product-title")]
links = [link.get("href") for link in soup.select("a[href^='/product/']")]

result = {
	"title": title,
	"products": products,
	"links": links,
}
print(result)

JavaScript with Cheerio

scraper.js
// npm i axios cheerio
const axios = require("axios");
const cheerio = require("cheerio");

const zenrows_api_base = "https://api.zenrows.com/v1/?apikey=YOUR_ZENROWS_API_KEY";
const url = "https://www.scrapingcourse.com/ecommerce/";

axios
	.get(zenrows_api_base, { params: { url } })
	.then((response) => {
		const $ = cheerio.load(response.data);

		const title = $("h1").text();
		const products = $(".product-title")
			.map((_, a) => $(a).text())
			.toArray();
		const links = $("a[href^='/product/']")
			.map((_, a) => $(a).attr("href"))
			.toArray();

		console.log({ title, products, links });
	})
	.catch((error) => console.log(error));

Using these examples and tips, you can effectively leverage ZenRows’ capabilities for both direct JSON extraction and traditional HTML parsing methods, enhancing your web scraping projects’ efficiency and flexibility.