> ## Documentation Index
> Fetch the complete documentation index at: https://docs.zenrows.com/llms.txt
> Use this file to discover all available pages before exploring further.

# How to Integrate LangChain with ZenRows

> Install and configure the langchain-zenrows package to give LLM agents and AI chains real-time web scraping and data extraction capabilities.

Extract web data with AI agents using ZenRows' enterprise-grade scraping infrastructure. The <a href="https://pypi.org/project/langchain-zenrows/" rel="nofollow">`langchain-zenrows`</a> integration enables large language models (LLMs) to access real-time web data using ZenRows' robust scraping infrastructure. This guide covers how to scrape data with LLMs using the `langchain-zenrows` module.

## What is LangChain?

LangChain is a framework that connects large language models to external data sources and applications. It provides a composable architecture that enables you to create AI workflows by chaining LLM operations, from simple prompt-response patterns to autonomous agents.

One key advantage of LangChain is that it allows for easy swapping, coupling, and decoupling of LLMs.

### Key Benefits of Integrating LangChain With ZenRows

The `langchain-zenrows` integration brings the following benefits:

* **Integrate ZenRows with LLMs**: Easily integrate scraping capabilities into your desired LLM.
* **Build an agentic data pipeline**: Assign different data pipeline roles to each LLM agent based on its capabilities.
* **Real-time web access without getting blocked**: Fetch live web content without antibot or JavaScript rendering limitations.
* **Multiple output formats**: Fetch website data in various formats, including HTML, Markdown, Plaintext, PDF, or Screenshots.
* **Specific data point extraction**: Extract specific data from web pages, such as emails, tables, phone numbers, images, and more.
* **Support for custom parsing**: Fetch specific information from web elements using ZenRows' advanced CSS selector feature.

## Use Cases

Here are some use cases of the `langchain-zenrows` integration:

* **Real-time monitoring**: Develop an AI application that scrapes and monitors website content changes in real-time.
* **Market research and demand forecasting**: Scrape demand signals, such as reviews, social comments, engagement metrics, price trends, and more. Then, pass the data to an LLM model for forecasting.
* **Finding the best deals**: Spot the best deals for a specific product from several e-commerce websites using ZenRows.
* **Review summarization**: Summarize scraped reviews using a selected model.
* **Sentiment analysis**: Scrape and analyze sentiment in social comments or product reviews.
* **Product research and comparison**: Compare products across multiple retail websites and e-commerce platforms to identify the best options.
* **Consistent data pipeline update**: Keep your data pipeline up to date with fresh data by integrating `langchain-zenrows` into your pipeline operations.

## Getting Started: Basic Usage

Let's start with a simple example that uses the `langchain-zenrows` package to scrape the <a href="https://www.scrapingcourse.com/antibot-challenge" rel="nofollow">Antibot Challenge</a> page and return its content in Markdown format.

Install the `langchain-zenrows` package using `pip`:

```bash theme={null}
pip3 install langchain-zenrows
```

Import the `ZenRowsUniversalScraper` class from the `langchain_zenrows` module, instantiate the universal scraper with your ZenRows API key, and specify ZenRows parameters with the `response_type` set to `markdown`:

```python Python theme={null}
from langchain_zenrows import ZenRowsUniversalScraper

# Set your ZenRows API key
os.environ["ZENROWS_API_KEY"] = "YOUR_ZENROWS_API_KEY"

# Instantiate the universal scraper
scraper = ZenRowsUniversalScraper()

url = "https://www.scrapingcourse.com/antibot-challenge"

# Set ZenRows parameters
params = {
    "url": url,
    "js_render": "true",
    "premium_proxy": "true",
    "response_type": "markdown",
}

# Get content in markdown format
result = scraper.invoke(params)
print(result)
```

The integration bypasses the target site's antibot measure and returns its content as Markdown:

```html Output theme={null}
[![](https://www.scrapingcourse.com/assets/images/logo.svg) Scraping Course](http://www.scrapingcourse.com/)

# Antibot Challenge

![](https://www.scrapingcourse.com/assets/images/challenge.svg)

## You bypassed the Antibot challenge! :D
```

You've successfully integrated ZenRows with LangChain and bypassed an antibot challenge. Let's build an AI research assistant with this integration.

## Advanced Usage: Building an AI Research Assistant

Let's take things a step further by building an AI-powered pricing research assistant for Etsy. Using the `langchain-zenrows` integration together with OpenAI's `gpt-4o-mini` model, our assistant will automatically visit Etsy's accessories category and extract key product details such as names, prices, and URLs.

Here's the prompt we'll use to guide the assistant:

### Example Prompt

```text Prompt theme={null}
Go to the accessories category of https://www.etsy.com/ and return the names, prices, and URLs of the top 4 products in JSON format using the autoparse feature.
```

### Step 1: Install the packages

```bash theme={null}
pip install langgraph langchain-openai langchain-zenrows
```

### Step 2: Add ZenRows as a scraping tool for the AI model

Import the necessary modules and define your ZenRows and OpenAI API keys. Instantiate OpenAI's chat model and `langchain-zenrows` integration with the relevant API keys. Configure the LLM agent to use ZenRows as a scraping tool:

```python Python theme={null}
# pip install langgraph langchain-openai langchain-zenrows
from langchain_zenrows import ZenRowsUniversalScraper
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
import os

os.environ["ZENROWS_API_KEY"] = "YOUR_ZENROWS_API_KEY"
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

def scraper():
    # initialize the model
    llm = ChatOpenAI(model="gpt-4o-mini")

    # initialize the universal scraper
    zenrows_tool = ZenRowsUniversalScraper()

    # create an agent that uses ZenRows as a tool
    agent = create_react_agent(llm, [zenrows_tool])
```

### Step 3: Prompt the AI Agent

Invoke the AI agent with the research prompt and execute the scraper. As stated in the prompt, the agent uses ZenRows' [`markdown`](/universal-scraper-api/features/output#markdown-response) response to scrape the target page in Markdown format.  It then analyzes the result and returns the 4 cheapest products:

```python Python theme={null}
# ...
def scraper():
    # ...

    try:
        # create a prompt
        result = agent.invoke(
            {
                "messages": "Go to the Accessories category page of https://www.etsy.com/. Scrape the page in markdown format and return the 4 cheapest products in JSON format."
            }
        )
        # extract the response
        for message in result["messages"]:
            print(f"{message.content}")

    except NameError:
        print(
            "⚠️  Agent not available."
        )
    except Exception as e:
        print(f"❌ Error running agent: {e}")

scraper()
```

The agent uses ZenRows to visit and scrape the product information. Once scraped, the agent returns the items in the desired format.

### Complete Code Example

Combine the snippets from the two steps, and you'll get the following code:

```python Python theme={null}
# pip install langgraph langchain-openai langchain-zenrows
from langchain_zenrows import ZenRowsUniversalScraper
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
import os

os.environ["ZENROWS_API_KEY"] = "YOUR_ZENROWS_API_KEY"
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

def scraper():

    # initialize the model
    llm = ChatOpenAI(model="gpt-4o-mini")

    # initialize the universal scraper
    zenrows_tool = ZenRowsUniversalScraper()

    # create an agent that uses ZenRows as a tool
    agent = create_react_agent(llm, [zenrows_tool])

    try:
        # create a prompt
        result = agent.invoke(
            {
                "messages": "Go to the Accessories category page of https://www.etsy.com/. Scrape the page in markdown format and return the 4 cheapest products in JSON format."
            }
        )
        # extract the response
        for message in result["messages"]:
            print(f"{message.content}")

    except NameError:
        print(
            "⚠️  Agent not available."
        )
    except Exception as e:
        print(f"❌ Error running agent: {e}")


scraper()
```

The above code returns the names, prices, and URLs of the 4 cheapest products in JSON format as expected.

### Example Output

```json Output theme={null}
[
    {
        "title": "Lovely Cat Keychain Gift For Pet Mom",
        "price": "$4.68",
        "url": "https://www.etsy.com/listing/1812260433/lovel...",
    },
    {
        "title": "Personalized slim leather keychain, key fob, custom keychain, leather initial keychain, quick shipping anniversary gift",
        "price": "$4.79",
        "url": "https://www.etsy.com/listing/876501930/personalized...",
    },
    {
        "title": "Custom OWALA Name Tag Back to School for daughter Owala Cup accessory for son waterbottle Tumbler Name Plate for sports tumbler athlete tag",
        "price": "$4.50",
        "url": "https://www.etsy.com/listing/1796331543/custom-...",
    },
    {
        "title": "Set of Blue and White Striped Hair Bows - 3-Inch Handmade Clips for Girls & Toddlers",
        "price": "$6.00",
        "url": "https://www.etsy.com/listing/4328846122/set...",
    },
]
```

Congratulations! 🎉 You've now integrated ZenRows as a web scraping tool for an AI agent using the `langchain-zenrows` module.

## API Reference

| Parameter              | Type      | Description                                                                                                                                                                             |
| ---------------------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `zenrows_api_key`      | `string`  | Your ZenRows API key. If not provided, the setup looks for the `ZENROWS_API_KEY` environment variable.                                                                                  |
| `url`                  | `string`  | **Required.** The URL to scrape.                                                                                                                                                        |
| `js_render`            | `boolean` | Enable JavaScript rendering with a headless browser. Essential for modern web apps, SPAs, and sites with dynamic content (default: False).                                              |
| `js_instructions`      | `string`  | Execute custom JavaScript on the page to interact with elements, scroll, click buttons, or manipulate content.                                                                          |
| `premium_proxy`        | `boolean` | Use residential IPs to bypass antibot protection. Essential for accessing protected sites (default: False).                                                                             |
| `proxy_country`        | `string`  | Set the country of the IP used for the request. Use for accessing geo-restricted content. Two-letter country code.                                                                      |
| `session_id`           | `integer` | Maintain the same IP for multiple requests for up to 10 minutes. Essential for multi-step processes.                                                                                    |
| `custom_headers`       | `boolean` | Include custom headers in your request to mimic browser behavior.                                                                                                                       |
| `wait_for`             | `string`  | Wait for a specific CSS Selector to appear in the DOM before returning content.                                                                                                         |
| `wait`                 | `integer` | Wait a fixed amount of milliseconds after page load.                                                                                                                                    |
| `block_resources`      | `string`  | Block specific resources (images, fonts, etc.) from loading to speed up scraping.                                                                                                       |
| `response_type`        | `string`  | Convert HTML to other formats. Options: "markdown", "plaintext", "pdf".                                                                                                                 |
| `css_extractor`        | `string`  | Extract specific elements using CSS selectors (JSON format).                                                                                                                            |
| `autoparse`            | `boolean` | Automatically extract structured data from HTML (default: False).                                                                                                                       |
| `screenshot`           | `string`  | Capture an above-the-fold screenshot of the page (default: "false").                                                                                                                    |
| `screenshot_fullpage`  | `string`  | Capture a full-page screenshot (default: "false").                                                                                                                                      |
| `screenshot_selector`  | `string`  | Capture a screenshot of a specific element using CSS Selector.                                                                                                                          |
| `screenshot_format`    | `string`  | Choose between "png" (default) and "jpeg" formats for screenshots.                                                                                                                      |
| `screenshot_quality`   | `integer` | For JPEG format, set the quality from 1 to 100. Lower values reduce file size but decrease quality.                                                                                     |
| `original_status`      | `boolean` | Return the original HTTP status code from the target page (default: False).                                                                                                             |
| `allowed_status_codes` | `string`  | Returns the content even if the target page fails with the specified status codes. Useful for debugging or when you need content from error pages.                                      |
| `json_response`        | `boolean` | Capture network requests in JSON format, including XHR or Fetch data. Ideal for intercepting API calls made by the web page (default: False).                                           |
| `outputs`              | `string`  | Specify which data types to extract from the scraped HTML. Accepted values: emails, phone numbers, headings, images, audios, videos, links, menus, hashtags, metadata, tables, favicon. |

<Note>
  For complete parameter documentation and details, see the official [ZenRows API Reference](/universal-scraper-api/api-reference#parameter-overview).
</Note>

## Troubleshooting

### Token limit exceeded

* **Solution 1**: If you hit the LLM token limit, it means the output size has exceeded what the model can process in a single request. You can parse specific data and then feed it to the LLM model.
* **Solution 2**: If the issue is related to usage-based token quotas or the model version's capabilities, consider upgrading your plan or switching to a higher model with higher bandwidth. For instance, moving from gpt-3.5 to gpt-4o-mini increases the token limit significantly.

### API key error

* **Solution 1**: Ensure you've added your ZenRows and the LLM's API keys to your environment variables.
* **Solution 2**: Cross-check the API keys and ensure you've entered the correct keys.

### Empty or incomplete data/tool response

* **Solution 1**: Activate JS rendering to handle dynamic content and increase the success rate.
* **Solution 2**: Increase the wait time using the ZenRows `wait` or `wait_for` parameter. The `wait` parameter introduces a general delay to allow the entire page to load, whereas `wait_for` targets a specific element, pausing execution until that element appears before scraping continues.
* **Solution 3**: If you've used the `css_extractor` parameter to target specific elements, ensure you've entered the correct selectors.

## Helpful Resources

* <a href="https://pypi.org/project/langchain-zenrows/" rel="nofollow">LangChain-ZenRows PyPI package</a>
* <a href="https://github.com/ZenRows-Hub/langchain-zenrows" rel="nofollow">LangChain-ZenRows GitHub repository</a>
* <a href="https://github.com/ZenRows-Hub/langchain-zenrows/tree/main/examples" rel="nofollow">Check our examples for more use cases</a>

## Frequently Asked Questions (FAQ)

<Accordion title="Which LLMs does langchain-zenrows support?">
  `langchain-zenrows` is compatible with all LLMs supported by LangChain. Check <a href="https://python.langchain.com/docs/integrations/chat/" rel="nofollow">LangChain's official chat models documentation</a> for more information.
</Accordion>

<Accordion title="Can I use selectors with the LLM agent option?">
  Yes, you can extract data from specific elements by explicitly specifying their selectors in your prompt.
</Accordion>

<Accordion title="Does langchain-zenrows support custom JavaScript execution?">
  Yes, you can include custom JavaScript via ZenRows' `js_instructions` parameter. Check our [JavaScript instructions guide](/universal-scraper-api/features/js-instructions) for more.
</Accordion>

<Accordion title="Is antibot bypass automatic with the LLM agent option?">
  Yes, ZenRows' antibot bypass features are activated automatically when using ZenRows as the agent's tool.
</Accordion>

<Accordion title="Does the LLM agent integration handle JS rendering?">
  Yes. The JS rendering parameter is activated on demand while scraping a JavaScript-rendered site. This enables you to scrape dynamic pages with ease.
</Accordion>

<Accordion title="How do I extract specific data with CSS selectors?">
  To extract data from specific elements, use ZenRows' `css_extractor` parameter to specify the selectors of the elements containing the data you want to scrape.
</Accordion>

<Accordion title="Can I take screenshots with the LLM agent integration?">
  Yes, you can prompt the LLM to take a half, full, or a specific element screenshot, and it will return your desired result using ZenRows' screenshot parameter.
</Accordion>

<Accordion title="What's the difference between this and other web scraping tools in LangChain?">
  ZenRows offers enterprise-grade reliability, featuring built-in antibot bypass, premium proxies, JavaScript rendering, and more. Unlike basic scrapers, it can handle protected sites, geo-restricted content, and modern SPAs without getting blocked.
</Accordion>
