How to Integrate LangChain with ZenRows
Extract web data with AI agents using ZenRows’ enterprise-grade scraping infrastructure. The langchain-zenrows
integration enables large language models (LLMs) to access real-time web data using ZenRows’ robust scraping infrastructure. This guide covers how to scrape data with LLMs using the langchain-zenrows
module.
What is LangChain?
LangChain is a framework that connects large language models to external data sources and applications. It provides a composable architecture that enables you to create AI workflows by chaining LLM operations, from simple prompt-response patterns to autonomous agents.
One key advantage of LangChain is that it allows for easy swapping, coupling, and decoupling of LLMs.
Key Benefits of Integrating LangChain With ZenRows
The langchain-zenrows
integration brings the following benefits:
- Integrate ZenRows with LLMs: Easily integrate scraping capabilities into your desired LLM.
- Build an agentic data pipeline: Assign different data pipeline roles to each LLM agent based on its capabilities.
- Real-time web access without getting blocked: Fetch live web content without antibot or JavaScript rendering limitations.
- Multiple output formats: Fetch website data in various formats, including HTML, Markdown, Plaintext, PDF, or Screenshots.
- Specific data point extraction: Extract specific data from web pages, such as emails, tables, phone numbers, images, and more.
- Support for custom parsing: Fetch specific information from web elements using ZenRows’ advanced CSS selector feature.
Use Cases
Here are some use cases of the langchain-zenrows
integration:
- Real-time monitoring: Develop an AI application that scrapes and monitors website content changes in real-time.
- Market research and demand forecasting: Scrape demand signals, such as reviews, social comments, engagement metrics, price trends, and more. Then, pass the data to an LLM model for forecasting.
- Finding the best deals: Spot the best deals for a specific product from several e-commerce websites using ZenRows.
- Review summarization: Summarize scraped reviews using a selected model.
- Sentiment analysis: Scrape and analyze sentiment in social comments or product reviews.
- Product research and comparison: Compare products across multiple retail websites and e-commerce platforms to identify the best options.
- Consistent data pipeline update: Keep your data pipeline up to date with fresh data by integrating
langchain-zenrows
into your pipeline operations.
Getting Started: Basic Usage
Let’s start with a simple example that uses the langchain-zenrows
package to scrape the Antibot Challenge page and return its content in Markdown format.
Install the langchain-zenrows
package using pip
:
Import the ZenRowsUniversalScraper
class from the langchain_zenrows
module, instantiate the universal scraper with your ZenRows API key, and specify ZenRows parameters with the response_type
set to markdown
:
The integration bypasses the target site’s antibot measure and returns its content as Markdown:
You’ve successfully integrated ZenRows with LangChain and bypassed an antibot challenge. Let’s build an AI research assistant with this integration.
Advanced Usage: Building an AI Research Assistant
Let’s take things a step further by building an AI-powered pricing research assistant for Etsy. Using the langchain-zenrows
integration together with OpenAI’s gpt-4o-mini
model, our assistant will automatically visit Etsy’s accessories category and extract key product details such as names, prices, and URLs.
Here’s the prompt we’ll use to guide the assistant:
Example Prompt
Step 1: Install the packages
Step 2: Add ZenRows as a scraping tool for the AI model
Import the necessary modules and define your ZenRows and OpenAI API keys. Instantiate OpenAI’s chat model and langchain-zenrows
integration with the relevant API keys. Configure the LLM agent to use ZenRows as a scraping tool:
Step 3: Prompt the AI Agent
Invoke the AI agent with the research prompt and execute the scraper. As stated in the prompt, the agent uses ZenRows’ markdown
response to scrape the target page in Markdown format. It then analyzes the result and returns the 4 cheapest products:
The agent uses ZenRows to visit and scrape the product information. Once scraped, the agent returns the items in the desired format.
Complete Code Example
Combine the snippets from the two steps, and you’ll get the following code:
The above code returns the names, prices, and URLs of the 4 cheapest products in JSON format as expected.
Example Output
Congratulations! 🎉 You’ve now integrated ZenRows as a web scraping tool for an AI agent using the langchain-zenrows
module.
API Reference
Parameter | Type | Description |
---|---|---|
zenrows_api_key | string | Your ZenRows API key. If not provided, the setup looks for the ZENROWS_API_KEY environment variable. |
url | string | Required. The URL to scrape. |
js_render | boolean | Enable JavaScript rendering with a headless browser. Essential for modern web apps, SPAs, and sites with dynamic content (default: False). |
js_instructions | string | Execute custom JavaScript on the page to interact with elements, scroll, click buttons, or manipulate content. |
premium_proxy | boolean | Use residential IPs to bypass antibot protection. Essential for accessing protected sites (default: False). |
proxy_country | string | Set the country of the IP used for the request. Use for accessing geo-restricted content. Two-letter country code. |
session_id | integer | Maintain the same IP for multiple requests for up to 10 minutes. Essential for multi-step processes. |
custom_headers | boolean | Include custom headers in your request to mimic browser behavior. |
wait_for | string | Wait for a specific CSS Selector to appear in the DOM before returning content. |
wait | integer | Wait a fixed amount of milliseconds after page load. |
block_resources | string | Block specific resources (images, fonts, etc.) from loading to speed up scraping. |
response_type | string | Convert HTML to other formats. Options: “markdown”, “plaintext”, “pdf”. |
css_extractor | string | Extract specific elements using CSS selectors (JSON format). |
autoparse | boolean | Automatically extract structured data from HTML (default: False). |
screenshot | string | Capture an above-the-fold screenshot of the page (default: “false”). |
screenshot_fullpage | string | Capture a full-page screenshot (default: “false”). |
screenshot_selector | string | Capture a screenshot of a specific element using CSS Selector. |
screenshot_format | string | Choose between “png” (default) and “jpeg” formats for screenshots. |
screenshot_quality | integer | For JPEG format, set the quality from 1 to 100. Lower values reduce file size but decrease quality. |
original_status | boolean | Return the original HTTP status code from the target page (default: False). |
allowed_status_codes | string | Returns the content even if the target page fails with the specified status codes. Useful for debugging or when you need content from error pages. |
json_response | boolean | Capture network requests in JSON format, including XHR or Fetch data. Ideal for intercepting API calls made by the web page (default: False). |
outputs | string | Specify which data types to extract from the scraped HTML. Accepted values: emails, phone numbers, headings, images, audios, videos, links, menus, hashtags, metadata, tables, favicon. |
For complete parameter documentation and details, see the official ZenRows API Reference.
Troubleshooting
Token limit exceeded
- Solution 1: If you hit the LLM token limit, it means the output size has exceeded what the model can process in a single request. You can parse specific data and then feed it to the LLM model.
- Solution 2: If the issue is related to usage-based token quotas or the model version’s capabilities, consider upgrading your plan or switching to a higher model with higher bandwidth. For instance, moving from gpt-3.5 to gpt-4o-mini increases the token limit significantly.
API key error
- Solution 1: Ensure you’ve added your ZenRows and the LLM’s API keys to your environment variables.
- Solution 2: Cross-check the API keys and ensure you’ve entered the correct keys.
Empty or incomplete data/tool response
- Solution 1: Activate JS rendering to handle dynamic content and increase the success rate.
- Solution 2: Increase the wait time using the ZenRows
wait
orwait_for
parameter. Thewait
parameter introduces a general delay to allow the entire page to load, whereaswait_for
targets a specific element, pausing execution until that element appears before scraping continues. - Solution 3: If you’ve used the
css_extractor
parameter to target specific elements, ensure you’ve entered the correct selectors.
Helpful Resources
- LangChain-ZenRows PyPI package
- LangChain-ZenRows GitHub repository
- Check our examples for more use cases
Frequently Asked Questions (FAQ)
Which LLMs does langchain-zenrows support?
Which LLMs does langchain-zenrows support?
langchain-zenrows
is compatible with all LLMs supported by LangChain. Check LangChain’s official chat models documentation for more information.
Can I use selectors with the LLM agent option?
Can I use selectors with the LLM agent option?
Yes, you can extract data from specific elements by explicitly specifying their selectors in your prompt.
Does langchain-zenrows support custom JavaScript execution?
Does langchain-zenrows support custom JavaScript execution?
Yes, you can include custom JavaScript via ZenRows’ js_instructions
parameter. Check our JavaScript instructions guide for more.
Is antibot bypass automatic with the LLM agent option?
Is antibot bypass automatic with the LLM agent option?
Yes, ZenRows’ antibot bypass features are activated automatically when using ZenRows as the agent’s tool.
Does the LLM agent integration handle JS rendering?
Does the LLM agent integration handle JS rendering?
Yes. The JS rendering parameter is activated on demand while scraping a JavaScript-rendered site. This enables you to scrape dynamic pages with ease.
How do I extract specific data with CSS selectors?
How do I extract specific data with CSS selectors?
To extract data from specific elements, use ZenRows’ css_extractor
parameter to specify the selectors of the elements containing the data you want to scrape.
Can I take screenshots with the LLM agent integration?
Can I take screenshots with the LLM agent integration?
Yes, you can prompt the LLM to take a half, full, or a specific element screenshot, and it will return your desired result using ZenRows’ screenshot parameter.
What's the difference between this and other web scraping tools in LangChain?
What's the difference between this and other web scraping tools in LangChain?
ZenRows offers enterprise-grade reliability, featuring built-in antibot bypass, premium proxies, JavaScript rendering, and more. Unlike basic scrapers, it can handle protected sites, geo-restricted content, and modern SPAs without getting blocked.