CSS Selectors

You can use CSS Selectors for data extraction. In the table below, you will find a list of examples of how to use it.

You only need to add &css_extractor={"links":"a @href"} to the request to use this feature.

Here are some examples

extraction rulessample htmlvaluejson output
{“divs”:“div”}<div>text0</div>text{“divs”: “text0”}
{“divs”:“div”}<div>text1</div><div>text2</div>text{“divs”: [“text1”, “text2”]}
{“links”:“a @href”}<a href=“#register”>Register</a>href attribute{“links”: “#register”}
{“hidden”:“input[type=hidden] @value”}<input type=“hidden” name=“_token” value=“f23g23g.b9u1bg91g.zv97” />value attribute{“hidden”: “f23g23g.b9u1bg91g.zv97”}
{“class”:“button.submit @data-v”}<button class=“submit” data-v=“register-user”>click</button>data-v attribute with submit class{“class”: “register-user”}
{“class”:“button.submit @data-v”}<button class=“submit” data-v=“register-user”>click</button>data-v attribute with submit class{“class”: “register-user”}
{“emails”:“a[href^=‘mailto:’] @href”}<a href=“mailto:test1@‍domain.com”>email 1</a><a href=“mailto:test2@‍domain.com”>email 2</a>href attribute for links starting with mailto:{“emails”: [“test1@‍domain.com”, “test2@‍domain.com”]}

If you are interested in learning more, you can find a complete reference of CSS Selectors here.

# pip install zenrows
from zenrows import ZenRowsClient

client = ZenRowsClient("YOUR_ZENROWS_API_KEY")
url = "https://httpbin.io/anything"
params = {
    "css_extractor": "{\"links\":\"a @href\", \"images\":\"img @src\"}"
}

response = client.get(url, params=params)

print(response.text)

Auto Parsing

ZenRows API will return the HTML of the URL by default. Enabling Autoparse uses our extraction algorithms to parse data in JSON format automatically.

Add &autoparse=true to the request for this feature.

# pip install zenrows
from zenrows import ZenRowsClient

client = ZenRowsClient("YOUR_ZENROWS_API_KEY")
url = "https://www.amazon.com/dp/B01LD5GO7I/"
params = {
    "autoparse": "true"
}

response = client.get(url, params=params)

print(response.text)

Markdown response

Enabling this feature will return the content parsed as Markdown instead of HTML. This feature is not compatible with CSS Selector or Auto Parsing.

Add &markdown_response=true to the request for this feature.

# pip install zenrows
from zenrows import ZenRowsClient

client = ZenRowsClient("YOUR_ZENROWS_API_KEY")
url = "https://www.amazon.com/dp/B01LD5GO7I/"
params = {
    "markdown_response": "true"
}

response = client.get(url, params=params)

print(response.text)

Page Screenshot

Takes an above-the-fold screenshot of the target page and returns it in PNG format. To enable this feature, add &screenshot=true to the request.

There are two other options for screenshots:

  • &screenshot_fullpage=true takes a full-page screenshot.
  • &screenshot_selector=<CSS Selector> takes a screenshot of the element given in the CSS Selector.

Due to the nature of the params, screenshot_selector and screenshot_fullpage are mutually exclusive. Both of them require &screenshot=true.

You can combine this feature with wait, wait_for, js_instructions and others. ZenRows will execute the whole request and then take the screenshot just before returning. If you use json_response, the result will be a JSON where one of the fields will be an object with the screenshot data in base64.

Requires javascript rendering (&js_render=true).

# pip install zenrows
from zenrows import ZenRowsClient

client = ZenRowsClient("YOUR_ZENROWS_API_KEY")
url = "https://httpbin.io/anything"
params = {
    "js_render": "true",
    "screenshot": "true",
    "screenshot_fullpage": "true"
}

response = client.get(url, params=params)

with open("screenshot.png", "wb") as f:
    f.write(response.content)

Download Files and Pictures

ZenRows will download images, PDFs or any type of file. Instead of reading the response’s content as text, you can store it directly in a file.

There is a size limit and we don’t recommend using ZenRows to download big files.