CSS Selectors

You can use CSS Selectors for data extraction. In the table below, you will find a list of examples of how to use it.

You only need to add &css_extractor={"links":"a @href"} to the request to use this feature.

Here are some examples

extraction rulessample htmlvaluejson output
{“divs”:“div”}<div>text0</div>text{“divs”: “text0”}
{“divs”:“div”}<div>text1</div><div>text2</div>text{“divs”: [“text1”, “text2”]}
{“links”:“a @href”}<a href=“#register”>Register</a>href attribute{“links”: “#register”}
{“hidden”:“input[type=hidden] @value”}<input type=“hidden” name=“_token” value=“f23g23g.b9u1bg91g.zv97” />value attribute{“hidden”: “f23g23g.b9u1bg91g.zv97”}
{“class”:“button.submit @data-v”}<button class=“submit” data-v=“register-user”>click</button>data-v attribute with submit class{“class”: “register-user”}
{“class”:“button.submit @data-v”}<button class=“submit” data-v=“register-user”>click</button>data-v attribute with submit class{“class”: “register-user”}
{“emails”:“a[href^=‘mailto:’] @href”}<a href=“mailto:test1@‍domain.com”>email 1</a><a href=“mailto:test2@‍domain.com”>email 2</a>href attribute for links starting with mailto:{“emails”: [“test1@‍domain.com”, “test2@‍domain.com”]}

If you are interested in learning more, you can find a complete reference of CSS Selectors here.

Auto Parsing

ZenRows API will return the HTML of the URL by default. Enabling Autoparse uses our extraction algorithms to parse data in JSON format automatically.

Add &autoparse=true to the request for this feature.

Markdown Response

By adding &markdown_response=true to the request parameters, the ZenRows API will return the content in a Markdown format, making it easier to read and work with, especially if you are more comfortable with Markdown than HTML.

It can be beneficial if you prefer working with Markdown for its simplicity and readability.

This feature is not compatible with CSS Selectors, Auto Parsing or Plain Text, which means you cannot use these options in conjunction with the Markdown response

Add markdown_response=true to the request:

Let’s say the HTML content of the Amazon product page includes a product title, a description, and a list of features. In HTML, it might look something like this:

<h1>Product Title</h1>
<p>This is a great product that does many things.</p>
<ul>
    <li>Feature 1</li>
    <li>Feature 2</li>
    <li>Feature 3</li>
</ul>

When you enable the Markdown response feature, ZenRows Scraping API will convert this HTML content into Markdown like this:

# Product Title

This is a great product that does many things.

- Feature 1
- Feature 2
- Feature 3

Plain Text Response

The plaintext_response feature is an output option that returns the scraped content as plain text instead of HTML or Markdown.

This feature can be helpful when you want a clean, unformatted version of the content without any HTML tags or Markdown formatting. It simplifies the content extraction process and makes processing or analyzing the text easier.

You can’t use the Plain Text Response in conjunction with CSS Selectors, Auto Parsing or Markdown

Add plaintext_response=true to the request:

Let’s say the HTML content of the Amazon product page includes a product title, a description, and a list of features. In HTML, it might look something like this:

<h1>Product Title</h1>
<p>This is a great product that does many things.</p>
<ul>
    <li>Feature 1</li>
    <li>Feature 2</li>
    <li>Feature 3</li>
</ul>

When you enable the plaintext_response feature, ZenRows Scraping API will convert this HTML content into plain text like this:

Product Title

This is a great product that does many things.

Feature 1
Feature 2
Feature 3

Page Screenshot

Capture an above-the-fold screenshot of the target page by adding screenshot=true to the request. By default, the image will be in PNG format.

Additional Options

  • screenshot_fullpage=true takes a full-page screenshot.
  • screenshot_selector=<CSS Selector> takes a screenshot of the element given in the CSS Selector.

Due to the nature of the params, screenshot_selector and screenshot_fullpage are mutually exclusive. Additionally, JavaScript rendering (js_render=true) is required.

These screenshot features can be combined with other options like wait, wait_for, or js_instructions to ensure that the page or elements are fully loaded before capturing the image. When using json_response, the result will include a JSON object with the screenshot data encoded in base64, allowing for easy integration into your workflows.

Image Format and Quality

In addition to the basic screenshot functionality, ZenRows offers customization options to optimize the output. These features are particularly useful for reducing file size, especially when taking full-page screenshots where the image might exceed 10MB, causing errors.

  • screenshot_format: Choose between png and jpeg formats, with PNG being the default. PNG is great for high-quality images and transparency, while JPEG offers efficient compression.
  • screenshot_quality: Applicable when using JPEG, this parameter allows you to set the quality from 1 to 100. Useful for balancing image clarity and file size, especially in scenarios where storage or bandwidth is limited.

Download Files and Pictures

ZenRows will download images, PDFs or any type of file. Instead of reading the response’s content as text, you can store it directly in a file.

There is a size limit and we don’t recommend using ZenRows to download big files.