JavaScript rendering (Headless browser)
Many modern websites use JavaScript to dynamically load content, meaning that the data you need might not be available in the initial HTML response. To handle such cases, you can use our JavaScript rendering feature, which simulates a real browser environment to fully load and render the page before extracting the data.
Enabling JavaScript Rendering
To activate JavaScript rendering, append js_render=true
to the request. This tells our system to process the page using a headless browser, allowing you to scrape content that is loaded dynamically by JavaScript.
Features Requiring JavaScript Rendering
Several features rely on js_render
being set to true. These include:
- Wait: Introduces a delay before proceeding with the request. Useful for scenarios where you need to allow time for JavaScript to load content.
- Wait For: Waits for a specific element to appear on the page before proceeding. When used with
js_render
, this parameter will cause the request to fail if the selector is not found. - JSON Response: Retrieves the rendered page content in JSON format, including data loaded dynamically via JavaScript.
- Block Resources: Block specific types of resources from being loaded.
- JavaScript Instructions: Allows you to execute custom JavaScript code on the page. This includes additional parameters.
- Screenshot: Capture an above-the-fold screenshot of the target page by adding
screenshot=true
to the request.
Wait Milliseconds
For websites that take longer to load, you might need to introduce a fixed delay to ensure that all content is fully loaded before retrieving the HTML. You can specify this delay in milliseconds using the wait=10000
parameter.
In this example, wait=10000
will cause ZenRows to wait for 10,000 milliseconds (or 10 seconds) before returning the HTML content. You can adjust this value based on your needs, with a maximum total allowable wait time of 30 seconds.
Wait For Selector
In some cases, you may need to wait for a specific CSS selector to be present in the DOM before ZenRows returns the content. This can be particularly useful for ensuring that dynamic elements or data have been fully loaded.
To implement this, add the wait_for=.price
parameter to your request URL. Replace .price
with the CSS selector of the element you are targeting.
JSON Response
To capture and analyze the response of XHR, Fetch, or AJAX requests, you can use the json_response=true
parameter in your API call. This will return a JSON object with detailed information about the page and its requests.
The JSON object includes the following fields, with optional third and fourth fields:
Fields in the JSON Response
- HTML: Contains the HTML content of the page. This content is encoded in JSON format and will need to be decoded to access the raw HTML.
- XHR: An array where each item represents an XHR, Fetch, or AJAX request made during the page load. Each object in the array includes:
- url: The URL of the request.
- body: The body of the request, if applicable.
- status_code: The HTTP status code of the response.
- method: The HTTP method used (e.g., GET, POST).
- headers: The response headers.
- request_headers: The request headers.
- js_instructions_report (Optional): An object providing a report on the execution of JavaScript instructions. This includes:
- instructions_duration: Total time spent executing JavaScript instructions (in milliseconds).
- instructions_executed: Number of JavaScript instructions executed.
- instructions_succeeded: Number of instructions that were successfully executed.
- instructions_failed: Number of instructions that failed.
- instructions: An array of objects detailing each instruction, including:
- instruction: The type of instruction (e.g., click, wait).
- params: Parameters used for the instruction.
- success: Whether the instruction was successful.
- duration: Time taken to execute the instruction (in milliseconds).
- screenshot (Optional): An object containing information about the screenshot taken of the target site, including:
- data: The base64-encoded image data.
- type: The image format (e.g., PNG, JPEG).
- width: The width of the screenshot (in pixels).
- height: The height of the screenshot (in pixels).
Block Resources
Why download and process data that you won’t be using? Blocking resources means preventing your headless browser from downloading specific types of content that you don’t need for your scraping task. This can include images, stylesheets, fonts, and other elements that might not be essential for your data extraction.
To improve scraping efficiency, reduce loading times, optimize performance, and reduce bandwidth usage, you can block specific types of resources from being loaded using the block_resources parameter.
ZenRows automatically blocks certain resource types by default, such as stylesheets and images, to optimize scraping speed and reduce unnecessary data load. So we recommend not using this feature unless it’s really necessary.
If you prefer to disable resource blocking entirely, set the parameter to “none”: block_resources=none
.
Available Resource Types
ZenRows allows you to block the following resource types:
- stylesheet: CSS files that define the visual styling of the page.
- image: Images, including icons and banners.
- media: Audio and video files.
- font: Web fonts used for text styling.
- script: JavaScript files.
- texttrack: Text tracks for video subtitles or captions.
- xhr: XMLHttpRequest requests used for AJAX calls.
- fetch: Fetch API requests.
- eventsource: EventSource requests for server-sent events.
- websocket: WebSocket connections.
- manifest: Web app manifests that define application metadata.
- other: Other resource types not specifically listed above.
To block multiple resources, separate them with commas. For example, to block images and stylesheets, use block_resources=image,stylesheet
.
Troubleshooting
Sometimes, blocking particular resources, especially Javascript files, results in an error or missing content. That might happen, for example, when the target website expects XHR calls after the initial render.
Follow these steps to troubleshoot:
Compare HTML Outputs
Compare the Plain HTML obtained with ZenRows and a sample obtained manually. The HTML should be similar.
Adjust Blocked Resources
If essential elements are missing, test again by removing the blocked resources (likely JavaScript or XHR).
If the issue persists, please contact us, and we’ll assist you.
JavaScript Instructions
ZenRows provides an extensive set of JavaScript Instructions, allowing you to interact with web pages dynamically.
These instructions enable you to click on elements, fill out forms, submit them, or wait for specific elements to appear, providing flexibility for tasks such as clicking the read more buttons or submitting forms.
Using the JavaScript Instructions
To use JavaScript Instructions, you must include two parameters: js_render
and js_instructions
. The js_instructions
parameter must be encoded.
You can use our Builder or an online tool to encode the instructions.
Here is an example of how to encode and use the instructions:
This set of instructions will load the page, locate the first element matching the .button-selector
CSS selector, and click on it. The instructions parameter accepts an array of commands that ZenRows will execute sequentially.
Sample Code for Various Languages
Summary of Actions
Here are some common actions you can perform with JavaScript Instructions:
Click on an element
The click
action lets you programmatically interact with webpage elements like buttons or links. It’s essential for navigating sites or accessing additional content, such as expanding sections or moving through pagination.
This action is often paired with wait_for
to handle elements that load dynamically. For example, on some pages, you might click a read more button to reveal the full content of an article.
Wait For Selector
The wait_for
instruction pauses the script until a specific element appears on the page, making it ideal for handling delayed content loading in Single Page Applications (SPAs) or dynamic websites.
This ensures that all necessary elements, like data fields or navigation buttons, are fully loaded before further actions are taken. For instance, after clicking a button, you might use wait_for
to ensure the next page’s key elements are present before proceeding with data extraction. This step is crucial for accurate and complete data retrieval in web scraping.
Wait
The wait
instruction pauses execution for a specified duration, defined in milliseconds. For example, {"wait": 1000}
pauses the script for one second.
This can be useful for ensuring specific actions, such as animations or data loading processes, are given enough time to complete before proceeding with further steps. It’s a straightforward way to handle timing issues and ensure all elements are ready for interaction or extraction.
Fill in an Input
The fill
instruction populates form fields with specified values, using a CSS selector to target the input element. This is particularly useful for automating form submissions, such as logging into a website.
The syntax below specifies the CSS selector for the input field and the value to enter. This method allows you to automate interactions with web forms, making it easier to perform tasks like login automation or data entry.
Check a Checkbox Input
The check
instruction is used to select the checkbox or radio input elements on a webpage specified by a CSS selector. It helps ensure that options like’ Remember me’ are selected during form submissions.
check
on an already checked input will not uncheck itUncheck a Checkbox Input
The uncheck
instruction is used to deselect checkbox or radio input elements on a webpage, specified by a CSS selector. This is useful for clearing default selections or ensuring specific options are not selected in forms.
Select an Option by its Value
To select an option from a dropdown menu, use the select_option
instruction. This requires an array with two strings: the first is the CSS selector for the dropdown, and the second is the value of the option you want to select.
Scroll Y
To scroll the page vertically, use the scroll_y instruction with the number of pixels you want to scroll. Below is the example for scrolling 1500 px.
Scroll X
To scroll the page horizontally, use the scroll_y instruction with the number of pixels you want to scroll. Below is the example for scrolling 1500 px.
Execute JavaScript Code (evaluate
)
Use evaluate
instructions to execute custom JavaScript on the page. If none of the previous ones fits your needs, you can write JavaScript code, and ZenRows will run it. Let’s say you want to scroll to a given element to trigger a “load more” event. Then, you can add another instruction to wait for the new part to load.
Solve CAPTCHAs
ZenRows bypasses most CAPTCHAs, but for in-page CAPTCHAs, like when submitting a form, you can integrate a paid solver (2Captcha). To use it, add your API Key in the integrations section.
You can solve various CAPTCHA types including hCaptcha
, reCAPTCHA
and Cloudflare Turnstile
. For invisible CAPTCHAs, send solve_inactive
set as true inside options
.
For more details on the resolution, you can add JSON Response to get a detailed summary of the JS Instructions.
Wait for a browser event
Specific actions require waiting for the browser to finish an action or navigation. The service can wait for the browser to trigger an event like load
or networkidle
.
Instructions Inside Iframes
Instructions for interacting with iframes are prefixed with frame_
and follow a similar syntax but require specifying the iframe.
For security, iframe’s content isn’t returned on the response. To get that content, use frame_reveal
. It will append a node with the content encoded in base64 to avoid problems with JS or HTML inyection. The new node will have an attribute data-id
with the given param and a iframe-content-element
class.
Using XPath
In addition to CSS selectors, you can use XPath to locate elements on a web page.
XPath is particularly useful when dealing with dynamic selectors or when more precise element selection is needed. However, if the website owner makes changes, the XPath might fail.
In this example, ZenRows will find the first <h2>
element that contains the text “Example” and click on it.
The most common use cases for the XPath are:
- Dynamic Content: When dealing with dynamic web pages with unreliable CSS selectors due to frequent changes.
- Nested Elements: When selecting elements deeply nested within other elements.
- Text-based Selection: Selecting elements based on their text content might not be easily achievable with CSS selectors.
Debug JS Instructions
To see a detailed report of the JS Instructions’ execution, set json_response
to true. That will return the result in JSON format, one of the fields being js_instructions_report
. Useful for testing and debugging. For more details, check the JSON Response documentation.
Here is an example report:
Example Using Instructions
What does a real example look like? We will use an AliExpress product page for a demo. We can summarize the process in a few steps:
- We wait for the selectors to appear and choose the color and size.
- Add to cart and wait for the cart modal to appear. We click on “View Shopping Cart,” which redirects us to a different page.
- Wait for the Cart page to load and check the added element.
- Click on “Buy” and fill in the registration form with an email and password.
It shows only part of the potential that this functionality adds. You could calculate different shipping prices by changing the shipping address. Or execute custom JavaScript logic with evaluate
to click an element from a list. The possibilities are endless.
Although we show examples with login forms, we discourage this usage. It would require you to log in to every request. If you need to scrape content as a logged-in user, don’t hesitate to contact us.
Frequently Asked Questions (FAQ)
Was this page helpful?