> ## Documentation Index
> Fetch the complete documentation index at: https://docs.zenrows.com/llms.txt
> Use this file to discover all available pages before exploring further.

# How to Integrate LlamaIndex with ZenRows

> Integrate ZenRows with LlamaIndex to build RAG applications that scrape, index, and query live web content from anti-bot protected websites.

Integrate <a href="https://github.com/run-llama/llama_index/tree/main/llama-index-integrations/readers/llama-index-readers-web/llama_index/readers/web/zenrows_web" target="_blank" rel="noopener noreferrer nofollow">ZenRows with LlamaIndex</a> to enable your RAG applications to access, index, and synthesize up-to-date web content from any website, including those with anti-bot protection and dynamic content.

## What Is LlamaIndex?

LlamaIndex is an open-source framework that connects LLMs to external data sources, databases, documents, and APIs. It provides tools for data ingestion, indexing, and query-based retrieval, commonly used to build retrieval-augmented generation (RAG) applications, which can be used to feed AI agents with up-to-date information.

## Key Integration Benefits

* **Uninterrupted access to data**: Build a reliable data layer that can access information from any website without getting blocked by anti-bot measures.
* **Real-time information retrieval**: Extract real-time data faster and more efficiently before it becomes stale.
* **Direct extraction of LLM-friendly data**: Get pre-formatted LLM-friendly data, such as the Markdown or JSON version of any website. ZenRows also enables the extraction of specific data directly.
* **Less code, more data**: Scrape data continuously with an auto-managed and auto-scaled solution with a simple API call.
* **Business-oriented development**: No extra engineering time and resources will be wasted on debugging or fixes.
* **Handle dynamic content easily**: Access heavily dynamic websites without performing complex waits and user simulations.
* **Borderless data retrieval**: Expose AI applications to data from any specific location without IP limitations using residential proxies with geo-targeted IPs.

## Use Cases of LlamaIndex-ZenRows Integration

* **Real-time price monitoring**: Use ZenRows to scrape prices from several product sites in real-time and synthesize a comprehensive comparison with an LLM.
* **Competitive research**: Scrape several competitors' offerings, product launches, strategies, and more with ZenRows and draw a correlation between the data using an LLM.
* **News and trends summarization**: Use ZenRows to aggregate news, trends, hashtags, and more, across similar platforms. Summarize the aggregated data with an LLM and extract specific insights.
* **Dynamic chatbots**: Build a chatbot that can access the web or specific web pages in real time to provide updated information.

## Getting Started: Basic Usage

This example demonstrates how to extract content from a protected website using the `ZenRowsWebReader`.

The `ZenRowsWebReader` enables you to use the official [ZenRows Universal Scraper API](https://www.zenrows.com/products/universal-scraper) as a data loader for web scraping in LlamaIndex.

<Steps>
  <Step title="Install the package">
    ```bash theme={null}
    pip3 install llama-index-readers-web
    ```
  </Step>

  <Step title="Basic implementation">
    Import `ZenRowsWebReader` from `llama-index-readers-web`. Initialize `ZenRowsWebReader` as a reader instance. Then, set your ZenRows parameters through this instance.

    Load the target site as a document and return its content in the specified format (Markdown response):

    ```python Python theme={null}
    # pip3 install llama-index-readers-web
    from llama_index.readers.web import ZenRowsWebReader

    api_key = "YOUR_ZENROWS_API_KEY"

    # initialize the reader
    reader = ZenRowsWebReader(
        api_key=api_key,
        js_render=True,
        premium_proxy=True,
        response_type="markdown",
    )

    # scrape a single URL
    documents = reader.load_data(["https://www.scrapingcourse.com/antibot-challenge/"])
    print(documents[0].text)
    ```

    The code returns a Markdown format of the target site, as shown:

    ```markdown Markdown theme={null}
    [![](https://www.scrapingcourse.com/assets/images/logo.svg) Scraping Course](http://www.scrapingcourse.com/)

    # Antibot Challenge

    ![](https://www.scrapingcourse.com/assets/images/challenge.svg)

    ## You bypassed the Antibot challenge! :D
    ```
  </Step>
</Steps>

## Advanced Usage: Building a Simple RAG System

This example creates a simple RAG system that indexes multiple websites and responds to queries using the collected data.

You'll need an OpenAI API key to use the LLM and embedding features. So, prepare your OpenAI API key.

<Steps>
  <Step title="Install the packages">
    ```bash theme={null}
    pip3 install llama-index-readers-web llama-index-llms-openai llama-index-embeddings-openai
    ```
  </Step>

  <Step title="Set up ZenRowsWebReader">
    Import the required packages and specify your ZenRows and OpenAI API keys. Initialize `ZenRowsWebReader` using the desired ZenRows parameters. Include `js_render` and `premium_proxy` to effectively bypass anti-bot measures.

    ```python Python theme={null}
    # pip3 install llama-index-readers-web llama-index-llms-openai llama-index-embeddings-openai
    from llama_index.core import VectorStoreIndex
    from llama_index.readers.web import ZenRowsWebReader
    import os

    os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

    api_key = "YOUR_ZENROWS_API_KEY"

    # set up ZenRowsWebReader
    reader = ZenRowsWebReader(
        api_key=api_key,
        js_render=True,
        premium_proxy=True,
        response_type="markdown",
        wait=2000,
    )
    ```
  </Step>

  <Step title="Set up a vector index">
    Specify the target URLs in a list, load their web pages as documents, and create a vectorized index of the documents:

    ```python Python theme={null}
    # ...
    urls = [
        "https://www.scrapingcourse.com/ecommerce",
        "https://www.scrapingcourse.com/button-click",
        "https://www.scrapingcourse.com/infinite-scrolling",
    ]

    # load each URL as a document
    documents = reader.load_data(urls)

    # create index
    index = VectorStoreIndex.from_documents(documents)
    ```
  </Step>

  <Step title="Query the index">
    Initialize a query engine from the index, pass a prompt to query it, and return the query response:

    ```python Python theme={null}
    # ...
    # query the content
    query_engine = index.as_query_engine()
    response = query_engine.query("What are the key features?")
    print(response)
    ```
  </Step>

  <Step title="Complete code">
    ```python Python theme={null}
    # pip3 install llama-index-readers-web llama-index-llms-openai llama-index-embeddings-openai
    from llama_index.core import VectorStoreIndex
    from llama_index.readers.web import ZenRowsWebReader
    import os

    os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

    api_key = "YOUR_ZENROWS_API_KEY"

    # set up ZenRowsWebReader
    reader = ZenRowsWebReader(
        api_key=api_key,
        js_render=True,
        premium_proxy=True,
        response_type="markdown",
        wait=2000,
    )

    urls = [
        "https://www.scrapingcourse.com/ecommerce",
        "https://www.scrapingcourse.com/button-click",
        "https://www.scrapingcourse.com/infinite-scrolling",
    ]

    # load each URL as a document
    documents = reader.load_data(urls)

    # create index
    index = VectorStoreIndex.from_documents(documents)

    # query the content
    query_engine = index.as_query_engine()
    response = query_engine.query("What are the key features?")
    print(response)
    ```

    LlamaIndex uses ZenRows to retrieve each website's information in Markdown format, vectorizes it, and synthesizes a response based on the query.

    Here's a sample response from the above code:

    ```markdown Markdown theme={null}
    The key features include a menu with options like Shop, Home, Cart, Checkout, and My account. Additionally, there is a list of products with images, names, prices, and options to select or add to cart for each item.
    ```
  </Step>
</Steps>

Congratulations! 🎉You've integrated ZenRows with LlamaIndex.

## API Reference

| Parameter              | Type | Description                                                                                                                                                                            |
| ---------------------- | ---- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `url`                  | str  | Required. The URL to scrape                                                                                                                                                            |
| `js_render`            | bool | Enable JavaScript rendering with a headless browser. Essential for modern web apps, SPAs, and sites with dynamic content (default: False)                                              |
| `js_instructions`      | str  | Execute custom JavaScript on the page to interact with elements, scroll, click buttons, or manipulate content                                                                          |
| `premium_proxy`        | bool | Use residential IPs to bypass anti-bot protection. Essential for accessing protected sites (default: False)                                                                            |
| `proxy_country`        | str  | Set the country of the IP used for the request. Use for accessing geo-restricted content. Two-letter country code                                                                      |
| `session_id`           | int  | Maintain the same IP for multiple requests for up to 10 minutes. Essential for multi-step processes                                                                                    |
| `custom_headers`       | dict | Include custom headers in your request to mimic browser behavior                                                                                                                       |
| `wait_for`             | str  | Wait for a specific CSS Selector to appear in the DOM before returning content                                                                                                         |
| `wait`                 | int  | Wait a fixed amount of milliseconds after page load                                                                                                                                    |
| `block_resources`      | str  | Block specific resources (images, fonts, etc.) from loading to speed up scraping                                                                                                       |
| `response_type`        | str  | Convert HTML to other formats. Options: "markdown", "plaintext", "pdf"                                                                                                                 |
| `css_extractor`        | str  | Extract specific elements using CSS selectors (JSON format)                                                                                                                            |
| `autoparse`            | bool | Automatically extract structured data from HTML (default: False)                                                                                                                       |
| `screenshot`           | str  | Capture an above-the-fold screenshot of the page (default: "false")                                                                                                                    |
| `screenshot_fullpage`  | str  | Capture a full-page screenshot (default: "false")                                                                                                                                      |
| `screenshot_selector`  | str  | Capture a screenshot of a specific element using CSS Selector                                                                                                                          |
| `screenshot_format`    | str  | Choose between "png" (default) and "jpeg" formats for screenshots                                                                                                                      |
| `screenshot_quality`   | int  | For JPEG format, set the quality from 1 to 100. Lower values reduce file size but decrease quality                                                                                     |
| `original_status`      | bool | Return the original HTTP status code from the target page (default: False)                                                                                                             |
| `allowed_status_codes` | str  | Returns the content even if the target page fails with the specified status codes. Useful for debugging or when you need content from error pages                                      |
| `json_response`        | bool | Capture network requests in JSON format, including XHR or Fetch data. Ideal for intercepting API calls made by the web page (default: False)                                           |
| `outputs`              | str  | Specify which data types to extract from the scraped HTML. Accepted values: emails, phone numbers, headings, images, audios, videos, links, menus, hashtags, metadata, tables, favicon |

<Note>For complete parameter documentation and details, see the official [ZenRows' Universal Scraper API Reference](/universal-scraper-api/api-reference).</Note>

## Troubleshooting

### The returned response is incomplete:

* **Solution 1**: Ensure you activate `js_render` and `premium_proxy` to bypass anti-bot measures and scrape reliably.
* **Solution 2**: Apply enough `wait` time to allow dynamic content to load completely before scraping. If a specific element holding the required data loads slowly, you can also wait for it using the **wait\_for** parameter.
* **Solution 3**: If only partial responses are returned, the LLM may be missing relevant information in the chunk. Adjust the engine query retrieval by increasing the number of chunks the LLM receives from the documents. Increase the chunk by adding a **similarity\_top\_k** parameter to the query engine as shown:
  ```python Python theme={null}
  # ...
  # query the content
  query_engine = index.as_query_engine(similarity_top_k=10)
  # …
  ```
* **Solution 4**: If you've used the `css_extractor` parameter to target specific elements, ensure you've entered the correct selectors.

### API key or authentication error

* **Solution**: Ensure you've supplied your LLM (e.g., OpenAI) and ZenRows API keys correctly.

### Module not found

* **Solution**: Install all the required modules:
  * `llama-index-readers-web`
  * `llama-index-llms-openai`
  * `llama-index-embeddings-openai`

## Resources

* <a href="https://github.com/run-llama/llama_index/tree/main/llama-index-integrations/readers/llama-index-readers-web/llama_index/readers/web/zenrows_web" target="_blank" rel="noopener noreferrer nofollow">ZenRowsWebReader on GitHub</a>

## Frequently Asked Questions (FAQ)

<Accordion title="What is the main use case of LlamaIndex-ZenRows integration?">
  The use cases of LlamaIndex-ZenRows integration are diverse. However, the primary application is to enable AI applications to access and reason over live, real-world web data, even from sites with anti-bot protections or dynamic content.
</Accordion>

<Accordion title="Does LlamaIndex-ZenRows integration support extraction via CSS selectors?">
  Yes, you can scrape data from specific elements using their CSS selectors via the `css_extractor` parameter.
</Accordion>

<Accordion title="Can I use all of ZenRows' parameters with ZenRowsWebReader?">
  Yes. The `ZenRowsWebReader` inherits all the features and capabilities of the ZenRows Universal Scraper API.
</Accordion>

<Accordion title="Which LLM integrations does LlamaIndex support?">
  LlamaIndex supports many popular LLMs, such as Groq, OpenAI, Anthropic, and more. Check LlamaIndex's official documentation for the supported LLMs.
</Accordion>

<Accordion title="Can I use ZenRows with LlamaIndex for Web Scraping?">
  LlamaIndex isn't explicitly designed for web scraping information from websites. However, you can add a scraping layer to LlamaIndex by pairing it with a web scraping tool like ZenRows, which provides it with anti-bot bypass capabilities.
</Accordion>
