How to Extract Data
ZenRows® provides a seamless way to extract data directly in the API call using CSS Selectors.
This feature, activated by the css_extractor
parameter, returns a JSON object instead of Plain HTML. This can significantly simplify your data extraction process, especially if you prefer not to handle HTML parsing manually. For those who already have their solutions or prefer custom parsing, ZenRows still allows you to retrieve Plain HTML and process it with libraries like BeautifulSoup for Python or Cheerio for JavaScript.
Using CSS Selectors
Suppose you want to scrape the title of the ScrapingCourse eCommerce page. The title is contained in an h1
tag. To extract it, send the css_extractor
parameter with the value {"title": "h1"}
. Ensure the parameter is properly encoded.
To take it a step further, let’s also extract the product names using the selector .product-name
.
This will return:
You might need a list of product links to continue scraping new product details. Filter these links to only include those starting with /product/
. To get the attribute instead of the text content, add: "links": "a[href*='/product/'] @href"
.
This will return:
Before going full scale, test your selectors and get the encoded results in our Builder. The Builder also outputs code in several languages, making it easier to integrate into your applications.
For more details, check CSS Selectors documentation.
Using External Libraries
If you prefer to use your favorite HTML parsing library, you can still retrieve Plain HTML from ZenRows and process it with tools like BeautifulSoup
and Cheerio
.
Python with BeautifulSoup
JavaScript with Cheerio
Using these examples and tips, you can effectively leverage ZenRows’ capabilities for both direct JSON extraction and traditional HTML parsing methods, enhancing your web scraping projects’ efficiency and flexibility.
Was this page helpful?