How to Integrate LangChain with ZenRows

Extract web data with AI agents using ZenRows’ enterprise-grade scraping infrastructure. The langchain-zenrows integration enables large language models (LLMs) to access real-time web data using ZenRows’ robust scraping infrastructure. This guide covers how to scrape data with LLMs using the langchain-zenrows module.

What is LangChain?

LangChain is a framework that connects large language models to external data sources and applications. It provides a composable architecture that enables you to create AI workflows by chaining LLM operations, from simple prompt-response patterns to autonomous agents.

One key advantage of LangChain is that it allows for easy swapping, coupling, and decoupling of LLMs.

Key Benefits of Integrating LangChain With ZenRows

The langchain-zenrows integration brings the following benefits:

Integrate ZenRows with LLMs: Easily integrate scraping capabilities into your desired LLM.
Build an agentic data pipeline: Assign different data pipeline roles to each LLM agent based on its capabilities.
Real-time web access without getting blocked: Fetch live web content without antibot or JavaScript rendering limitations.
Multiple output formats: Fetch website data in various formats, including HTML, Markdown, Plaintext, PDF, or Screenshots.
Specific data point extraction: Extract specific data from web pages, such as emails, tables, phone numbers, images, and more.
Support for custom parsing: Fetch specific information from web elements using ZenRows’ advanced CSS selector feature.

Use Cases

Here are some use cases of the langchain-zenrows integration:

Real-time monitoring: Develop an AI application that scrapes and monitors website content changes in real-time.
Market research and demand forecasting: Scrape demand signals, such as reviews, social comments, engagement metrics, price trends, and more. Then, pass the data to an LLM model for forecasting.
Finding the best deals: Spot the best deals for a specific product from several e-commerce websites using ZenRows.
Review summarization: Summarize scraped reviews using a selected model.
Sentiment analysis: Scrape and analyze sentiment in social comments or product reviews.
Product research and comparison: Compare products across multiple retail websites and e-commerce platforms to identify the best options.
Consistent data pipeline update: Keep your data pipeline up to date with fresh data by integrating langchain-zenrows into your pipeline operations.

Getting Started: Basic Usage

Let’s start with a simple example that uses the langchain-zenrows package to scrape the Antibot Challenge page and return its content in Markdown format.

Install the langchain-zenrows package using pip:

pip3 install langchain-zenrows

Import the ZenRowsUniversalScraper class from the langchain_zenrows module, instantiate the universal scraper with your ZenRows API key, and specify ZenRows parameters with the response_type set to markdown:

Python

from langchain_zenrows import ZenRowsUniversalScraper

# Set your ZenRows API key
os.environ["ZENROWS_API_KEY"] = "YOUR_ZENROWS_API_KEY"

# Instantiate the universal scraper
scraper = ZenRowsUniversalScraper()

url = "https://www.scrapingcourse.com/antibot-challenge"

# Set ZenRows parameters
params = {
    "url": url,
    "js_render": "true",
    "premium_proxy": "true",
    "response_type": "markdown",
}

# Get content in markdown format
result = scraper.invoke(params)
print(result)

The integration bypasses the target site’s antibot measure and returns its content as Markdown:

Output

[![](https://www.scrapingcourse.com/assets/images/logo.svg) Scraping Course](http://www.scrapingcourse.com/)

# Antibot Challenge

![](https://www.scrapingcourse.com/assets/images/challenge.svg)

## You bypassed the Antibot challenge! :D

You’ve successfully integrated ZenRows with LangChain and bypassed an antibot challenge. Let’s build an AI research assistant with this integration.

Advanced Usage: Building an AI Research Assistant

Let’s take things a step further by building an AI-powered pricing research assistant for Etsy. Using the langchain-zenrows integration together with OpenAI’s gpt-4o-mini model, our assistant will automatically visit Etsy’s accessories category and extract key product details such as names, prices, and URLs.

Here’s the prompt we’ll use to guide the assistant:

Example Prompt

Prompt

Go to the accessories category of https://www.etsy.com/ and return the names, prices, and URLs of the top 4 products in JSON format using the autoparse feature.

Step 1: Install the packages

pip install langgraph langchain-openai langchain-zenrows

Step 2: Add ZenRows as a scraping tool for the AI model

Import the necessary modules and define your ZenRows and OpenAI API keys. Instantiate OpenAI’s chat model and langchain-zenrows integration with the relevant API keys. Configure the LLM agent to use ZenRows as a scraping tool:

Python

# pip install langgraph langchain-openai langchain-zenrows
from langchain_zenrows import ZenRowsUniversalScraper
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
import os

os.environ["ZENROWS_API_KEY"] = "YOUR_ZENROWS_API_KEY"
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

def scraper():
    # initialize the model
    llm = ChatOpenAI(model="gpt-4o-mini")

    # initialize the universal scraper
    zenrows_tool = ZenRowsUniversalScraper()

    # create an agent that uses ZenRows as a tool
    agent = create_react_agent(llm, [zenrows_tool])

Step 3: Prompt the AI Agent

Invoke the AI agent with the research prompt and execute the scraper. As stated in the prompt, the agent uses ZenRows’ markdown response to scrape the target page in Markdown format. It then analyzes the result and returns the 4 cheapest products:

Python

# ...
def scraper():
    # ...

    try:
        # create a prompt
        result = agent.invoke(
            {
                "messages": "Go to the Accessories category page of https://www.etsy.com/. Scrape the page in markdown format and return the 4 cheapest products in JSON format."
            }
        )
        # extract the response
        for message in result["messages"]:
            print(f"{message.content}")

    except NameError:
        print(
            "⚠️  Agent not available."
        )
    except Exception as e:
        print(f"❌ Error running agent: {e}")

scraper()

The agent uses ZenRows to visit and scrape the product information. Once scraped, the agent returns the items in the desired format.

Complete Code Example

Combine the snippets from the two steps, and you’ll get the following code:

Python

# pip install langgraph langchain-openai langchain-zenrows
from langchain_zenrows import ZenRowsUniversalScraper
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
import os

os.environ["ZENROWS_API_KEY"] = "YOUR_ZENROWS_API_KEY"
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

def scraper():

    # initialize the model
    llm = ChatOpenAI(model="gpt-4o-mini")

    # initialize the universal scraper
    zenrows_tool = ZenRowsUniversalScraper()

    # create an agent that uses ZenRows as a tool
    agent = create_react_agent(llm, [zenrows_tool])

    try:
        # create a prompt
        result = agent.invoke(
            {
                "messages": "Go to the Accessories category page of https://www.etsy.com/. Scrape the page in markdown format and return the 4 cheapest products in JSON format."
            }
        )
        # extract the response
        for message in result["messages"]:
            print(f"{message.content}")

    except NameError:
        print(
            "⚠️  Agent not available."
        )
    except Exception as e:
        print(f"❌ Error running agent: {e}")


scraper()

The above code returns the names, prices, and URLs of the 4 cheapest products in JSON format as expected.

Example Output

Output

[
    {
        "title": "Lovely Cat Keychain Gift For Pet Mom",
        "price": "$4.68",
        "url": "https://www.etsy.com/listing/1812260433/lovel...",
    },
    {
        "title": "Personalized slim leather keychain, key fob, custom keychain, leather initial keychain, quick shipping anniversary gift",
        "price": "$4.79",
        "url": "https://www.etsy.com/listing/876501930/personalized...",
    },
    {
        "title": "Custom OWALA Name Tag Back to School for daughter Owala Cup accessory for son waterbottle Tumbler Name Plate for sports tumbler athlete tag",
        "price": "$4.50",
        "url": "https://www.etsy.com/listing/1796331543/custom-...",
    },
    {
        "title": "Set of Blue and White Striped Hair Bows - 3-Inch Handmade Clips for Girls & Toddlers",
        "price": "$6.00",
        "url": "https://www.etsy.com/listing/4328846122/set...",
    },
]

Congratulations! 🎉 You’ve now integrated ZenRows as a web scraping tool for an AI agent using the langchain-zenrows module.

API Reference

Parameter	Type	Description
`zenrows_api_key`	`string`	Your ZenRows API key. If not provided, the setup looks for the `ZENROWS_API_KEY` environment variable.
`url`	`string`	Required. The URL to scrape.
`js_render`	`boolean`	Enable JavaScript rendering with a headless browser. Essential for modern web apps, SPAs, and sites with dynamic content (default: False).
`js_instructions`	`string`	Execute custom JavaScript on the page to interact with elements, scroll, click buttons, or manipulate content.
`premium_proxy`	`boolean`	Use residential IPs to bypass antibot protection. Essential for accessing protected sites (default: False).
`proxy_country`	`string`	Set the country of the IP used for the request. Use for accessing geo-restricted content. Two-letter country code.
`session_id`	`integer`	Maintain the same IP for multiple requests for up to 10 minutes. Essential for multi-step processes.
`custom_headers`	`boolean`	Include custom headers in your request to mimic browser behavior.
`wait_for`	`string`	Wait for a specific CSS Selector to appear in the DOM before returning content.
`wait`	`integer`	Wait a fixed amount of milliseconds after page load.
`block_resources`	`string`	Block specific resources (images, fonts, etc.) from loading to speed up scraping.
`response_type`	`string`	Convert HTML to other formats. Options: “markdown”, “plaintext”, “pdf”.
`css_extractor`	`string`	Extract specific elements using CSS selectors (JSON format).
`autoparse`	`boolean`	Automatically extract structured data from HTML (default: False).
`screenshot`	`string`	Capture an above-the-fold screenshot of the page (default: “false”).
`screenshot_fullpage`	`string`	Capture a full-page screenshot (default: “false”).
`screenshot_selector`	`string`	Capture a screenshot of a specific element using CSS Selector.
`screenshot_format`	`string`	Choose between “png” (default) and “jpeg” formats for screenshots.
`screenshot_quality`	`integer`	For JPEG format, set the quality from 1 to 100. Lower values reduce file size but decrease quality.
`original_status`	`boolean`	Return the original HTTP status code from the target page (default: False).
`allowed_status_codes`	`string`	Returns the content even if the target page fails with the specified status codes. Useful for debugging or when you need content from error pages.
`json_response`	`boolean`	Capture network requests in JSON format, including XHR or Fetch data. Ideal for intercepting API calls made by the web page (default: False).
`outputs`	`string`	Specify which data types to extract from the scraped HTML. Accepted values: emails, phone numbers, headings, images, audios, videos, links, menus, hashtags, metadata, tables, favicon.

For complete parameter documentation and details, see the official ZenRows API Reference.

Troubleshooting

Token limit exceeded

Solution 1: If you hit the LLM token limit, it means the output size has exceeded what the model can process in a single request. You can parse specific data and then feed it to the LLM model.
Solution 2: If the issue is related to usage-based token quotas or the model version’s capabilities, consider upgrading your plan or switching to a higher model with higher bandwidth. For instance, moving from gpt-3.5 to gpt-4o-mini increases the token limit significantly.

API key error

Solution 1: Ensure you’ve added your ZenRows and the LLM’s API keys to your environment variables.
Solution 2: Cross-check the API keys and ensure you’ve entered the correct keys.

Empty or incomplete data/tool response

Solution 1: Activate JS rendering to handle dynamic content and increase the success rate.
Solution 2: Increase the wait time using the ZenRows wait or wait_for parameter. The wait parameter introduces a general delay to allow the entire page to load, whereas wait_for targets a specific element, pausing execution until that element appears before scraping continues.
Solution 3: If you’ve used the css_extractor parameter to target specific elements, ensure you’ve entered the correct selectors.

Helpful Resources

Frequently Asked Questions (FAQ)

Which LLMs does langchain-zenrows support?

Can I use selectors with the LLM agent option?

Does langchain-zenrows support custom JavaScript execution?

Is antibot bypass automatic with the LLM agent option?

Does the LLM agent integration handle JS rendering?

How do I extract specific data with CSS selectors?

Can I take screenshots with the LLM agent integration?

What's the difference between this and other web scraping tools in LangChain?

Developer Tools

No-code/Low-code Integrations

AI & Automation

Captcha Solvers

How to Integrate LangChain with ZenRows

What is LangChain?

Key Benefits of Integrating LangChain With ZenRows

Use Cases

Getting Started: Basic Usage

Advanced Usage: Building an AI Research Assistant

Example Prompt

Step 1: Install the packages

Step 2: Add ZenRows as a scraping tool for the AI model

Step 3: Prompt the AI Agent

Complete Code Example

Example Output

API Reference

Troubleshooting

Token limit exceeded

API key error

Empty or incomplete data/tool response

Helpful Resources

Frequently Asked Questions (FAQ)

Developer Tools

No-code/Low-code Integrations

AI & Automation

Captcha Solvers

​What is LangChain?

​Key Benefits of Integrating LangChain With ZenRows

​Use Cases

​Getting Started: Basic Usage

​Advanced Usage: Building an AI Research Assistant

​Example Prompt

​Step 1: Install the packages

​Step 2: Add ZenRows as a scraping tool for the AI model

​Step 3: Prompt the AI Agent

​Complete Code Example

​Example Output

​API Reference

​Troubleshooting

​Token limit exceeded

​API key error

​Empty or incomplete data/tool response

​Helpful Resources

​Frequently Asked Questions (FAQ)

What is LangChain?

Key Benefits of Integrating LangChain With ZenRows

Use Cases

Getting Started: Basic Usage

Advanced Usage: Building an AI Research Assistant

Example Prompt

Step 1: Install the packages

Step 2: Add ZenRows as a scraping tool for the AI model

Step 3: Prompt the AI Agent

Complete Code Example

Example Output

API Reference

Troubleshooting

Token limit exceeded

API key error

Empty or incomplete data/tool response

Helpful Resources

Frequently Asked Questions (FAQ)