Combine OpenAI’s GPT models with ZenRows’ Universal Scraper API to build AI workflows that can read any website in real time. This guide shows how to use ZenRows with OpenAI’s Responses API, function calling, structured outputs, and the hosted MCP tool, so your AI applications can ground their answers in live web content, even from anti-bot-protected and JavaScript-heavy sites. This guide covers the standard OpenAI API. If you are building autonomous, multi-step agents with the OpenAI Agents SDK, see the OpenAI Agents SDK integration guide.Documentation Index
Fetch the complete documentation index at: https://docs.zenrows.com/llms.txt
Use this file to discover all available pages before exploring further.
Just want to plug GPT into the live web? Use ZenRows MCP with OpenAI.The fastest path to a web-aware GPT is the ZenRows hosted MCP server.The Responses API natively supports remote MCP servers, so you can register
https://mcp.zenrows.com/mcp as a tool and let GPT invoke ZenRows’ scraping capabilities directly. No function-calling boilerplate, no orchestration code. Jump to Using ZenRows MCP with the Responses API for a complete code example.What is OpenAI?
OpenAI is the company behind GPT, the family of large language models powering ChatGPT and a developer platform used by hundreds of thousands of teams. The OpenAI API provides programmatic access to these models through the Responses API, the recommended interface for all new projects, with built-in support for function calling, structured outputs, reasoning models, and remote MCP servers. GPT models excel at reasoning, summarization, and structured generation, but they get blocked while accessing the web on their own, especially at scale. Pairing OpenAI with ZenRows closes that gap.Key benefits of integrating OpenAI with ZenRows
- Real-time web grounding for any GPT model: GPT has a fixed knowledge cutoff. ZenRows feeds live, up-to-date web content into any response, eliminating stale or hallucinated answers about recent events, prices, or product details.
- Anti-bot bypass out of the box: Adaptive Stealth Mode (
mode=auto) automatically handles JavaScript rendering, premium proxies, fingerprinting, and bot detection, so your AI can read pages that would block a regular scraper. - Token-efficient Markdown output: ZenRows returns clean Markdown in addition to raw HTML, which reduces token usage and improves model accuracy on the same context window.
- Native function calling: Expose ZenRows as a function tool and let GPT decide when to scrape based on the user’s question, with no orchestration code required.
- Works with every OpenAI API surface: Use ZenRows with the Responses API, structured outputs (
responses.parse), reasoning models, and the hosted MCP tool. The integration is a plain HTTP call, so it fits into any pattern. - MCP-ready out of the box: Plug the ZenRows hosted MCP server directly into the Responses API as a tool for zero-code access to all of ZenRows’ scraping capabilities.
Use cases
The OpenAI and ZenRows combination unlocks a wide range of AI workflows:- Web-aware chatbots: Build assistants that can answer questions about any URL the user provides, including protected sites like e-commerce stores, real estate portals, and news outlets.
- Real-time competitive intelligence: Have GPT analyze competitor pricing pages, product launches, and changelogs as they happen.
- Lead enrichment: Scrape company websites and let GPT extract industry, headcount signals, tech stack hints, and product summaries into your CRM.
- Automated research and reporting: Pull and summarize industry articles, financial filings, or technical documentation into ready-to-share briefs.
- Structured data extraction: Pull strongly-typed JSON (products, jobs, listings, reviews) from any page using GPT’s structured outputs grounded in ZenRows-scraped content.
Getting started: Basic Usage
Let’s start with a simple example: scrape a JavaScript-heavy, anti-bot-protected demo page using ZenRows and summarize it withgpt-5-mini through the Responses API.
Two things are happening here:
- ZenRows handles the scrape.
mode=autotells the API to start with the cheapest viable configuration and automatically escalate to JavaScript rendering or premium proxies if the target site requires it.response_type=markdownreturns clean Markdown instead of raw HTML, which is ideal for LLM context. - OpenAI handles the reasoning. The Markdown is fed into
client.responses.create(), with system-level guidance passed through theinstructionsparameter and the user prompt passed throughinput.
Advanced Usage: Building a web-aware AI assistant with function calling
Function calling is OpenAI’s mechanism for letting a model decide when to call an external tool. Instead of always scraping before each prompt, you expose ZenRows as a function and let GPT call it only when needed. This pattern is the foundation of any production-grade AI application that needs web access.Set up the environment
Run the following command in your terminal to install the OpenAI Python library and requests:Create a
.env file:Define the scraping tool
Wrap the ZenRows REST call in a Python function and describe it to the Responses API using a JSON schema:The Responses API uses a flat tool definition (
Python
name, description, and parameters at the top level of the tool object).Let GPT decide when to scrape
Send the user’s question along with the tool definition. GPT returns either a direct answer or a
function_call item in the response output:Python
When using reasoning models like ChatGPT-5, the response output may include reasoning items alongside function calls. The line
input_list += response.output captures all output items, including reasoning, which must be passed back with tool call outputs for the model to continue correctly.news.ycombinator.com, calls scrape_website with the right URL, then synthesizes a grounded answer from the scraped Markdown.
Structured data extraction
OpenAI’s structured outputs let you guarantee a model returns JSON that matches a specific schema. Combined with ZenRows, this is the cleanest way to extract typed data from any web page. The Responses API exposes structured outputs throughclient.responses.parse(), which takes a Pydantic model directly and returns a parsed Python object.
The example below scrapes a company homepage and extracts a strongly-typed CompanyInfo object:
Python
Example output
mode=auto, and let GPT return validated JSON.
Skip the schema with ZenRows AutoparseFor common site types, you can skip LLM-based parsing entirely. Adding
autoparse=true to your ZenRows request automatically identifies and extracts product details, article content, job listings, property data, and similar information into clean JSON, with no Pydantic schema or model call required.ZenRows Autoparse is included at no additional cost. Return the JSON directly to your application for known-structure pages, or pass it to GPT for downstream enrichment, normalization, or categorization. Pre-structured input also uses far fewer tokens than raw Markdown, which keeps your context window lean and your model calls cheaper.Note that Autoparse cannot be combined with response_type=markdown in the same request.Using ZenRows MCP with the Responses API
The Responses API natively supports remote Model Context Protocol (MCP) servers as tools. ZenRows publishes a hosted MCP server that exposes web scraping capabilities, so you can give GPT real-time web access without writing any function-calling boilerplate.server_urlpoints at ZenRows’ hosted MCP endpoint.authorizationcarries your ZenRows API key. OpenAI does not retain the value between requests, so it must be present on every API call.require_approval="never"skips the per-tool-call approval step, which is appropriate for trusted servers like ZenRows. To require explicit approval, remove this field or set it to"always".
API reference
The most useful Universal Scraper API parameters when working with OpenAI:| Parameter | Type | Description |
|---|---|---|
apikey | string | Required. Your ZenRows API key, passed as a query parameter. |
url | string | Required. The URL to scrape. |
mode | string | Set to auto to enable Adaptive Stealth Mode, which automatically picks the cheapest working configuration for each request. Recommended default for AI workflows. |
response_type | string | Convert HTML to other formats. Use markdown for LLM context (recommended), or plaintext, pdf. |
js_render | boolean | Enable JavaScript rendering with a headless browser. Set automatically by mode=auto when needed. |
js_instructions | string | Execute custom JavaScript on the page (click, scroll, fill forms) before returning content. |
premium_proxy | boolean | Use residential IPs to bypass anti-bot protection. Set automatically by mode=auto when needed. |
proxy_country | string | Two-letter country code for geo-restricted content (e.g. us, gb, de). Requires premium proxies, which mode=auto enables automatically when needed. |
session_id | integer | Maintain the same IP across multiple requests for up to 10 minutes. |
wait_for | string | Wait for a specific CSS selector to appear in the DOM before returning content. |
wait | integer | Wait a fixed number of milliseconds after page load. |
css_extractor | string | Extract specific elements using CSS selectors (JSON format). |
outputs | string | Extract specific data types as structured JSON using a comma-separated list of filters: emails, phone_numbers, headings, images, audios, videos, links, menus, hashtags, metadata, tables, favicon. Use outputs=* to retrieve all available data types. See the Output Filters documentation for details. |
screenshot / screenshot_fullpage | boolean | Capture a screenshot of the page. Useful for multimodal GPT inputs. |
Troubleshooting
Token limit exceeded
- Option 1: Use
response_type=markdown(already shown in every example above). Markdown reduces token usage significantly compared to raw HTML. - Option 2: Use the
css_extractororoutputsparameter to scrape only the parts of the page you need (a product card, a pricing table, an article body) instead of the entire DOM. - Option 3: Chunk the scraped content into smaller pieces, summarize each chunk separately, and combine the results, or switch to a model with a larger context window.
API key errors
- Option 1: Confirm both
ZENROWS_API_KEYandOPENAI_API_KEYare set in your environment. - Option 2: Verify your ZenRows API key in the dashboard and your OpenAI key in the OpenAI platform.
- Option 3: Check that your ZenRows subscription is active and has remaining quota on the Analytics page.
Empty or incomplete tool responses
- Option 1: Confirm
mode=autois set. Without it, JavaScript rendering and premium proxies are off by default and protected sites return blocked or empty pages. - Option 2: For sites that load content asynchronously, add
wait_for=<css-selector>(waits for a specific element) orwait=5000(waits a fixed duration in milliseconds). - Option 3: If the model is calling the tool with a malformed URL, tighten the function description in the JSON schema or add validation in your
scrape_websitewrapper before sending the request to ZenRows.
Reasoning items missing after tool calls
When using reasoning models likegpt-5, the response output may include reasoning items alongside function calls. Always append the entire response.output back to your input list (not just function calls) before sending the tool result. The pattern input_list += response.output shown in the function calling example handles this automatically.
Helpful resources
- ZenRows Universal Scraper API reference
- ZenRows MCP server documentation
- Adaptive Stealth Mode documentation
- ZenRows + OpenAI Agents SDK integration
- OpenAI Responses API guide
- OpenAI function calling guide
- OpenAI structured outputs guide
Frequently asked questions
Which OpenAI models work with this integration?
Which OpenAI models work with this integration?
All current models. Function calling and structured outputs work best with
gpt-4o, gpt-4o-mini, and other recent models. The hosted MCP tool requires a model that supports remote tools on the Responses API.How do I handle pages that exceed the model's token limit?
How do I handle pages that exceed the model's token limit?
Three options, in order of preference: use
response_type=markdown to keep token usage low; use css_extractor or outputs to scrape only the relevant section of the page; chunk the scraped content and summarize each chunk before combining, or switch to a model with a larger context window.Does ZenRows handle JavaScript-heavy and anti-bot-protected sites automatically?
Does ZenRows handle JavaScript-heavy and anti-bot-protected sites automatically?
Yes. Setting
mode=auto enables Adaptive Stealth Mode, which automatically activates JavaScript rendering, premium residential proxies, and stealth fingerprinting only when the target site requires them. You only pay for what succeeds.Can I use this with the OpenAI Agents SDK?
Can I use this with the OpenAI Agents SDK?
Yes. The Agents SDK is built on top of the Responses API, so the same patterns apply. For agent-specific workflows including multi-step research, handoffs, and the Agents SDK’s built-in MCP support, see the OpenAI Agents SDK integration guide.
Can I let GPT browse multiple pages autonomously?
Can I let GPT browse multiple pages autonomously?
Yes. Function calling supports multi-turn tool use, so GPT can call
scrape_website repeatedly with different URLs, read what it gets back, and decide what to scrape next. For more complex agent behavior, pair this with the hosted MCP tool, which exposes a richer scraping surface without requiring you to maintain the function-calling loop yourself. For autonomous, multi-step research workflows, see the OpenAI Agents SDK integration guide.How does this compare to OpenAI's built-in web search tool?
How does this compare to OpenAI's built-in web search tool?
OpenAI’s built-in web search is a black box: you don’t choose the source, control the depth, or get raw page content. ZenRows gives you full control over which URL is fetched, how the page is rendered, what format the response comes back in, and which countries the request is routed through. It also unlocks pages that public search engines don’t surface or that block scrapers, such as logged-in views, region-restricted content, and deep e-commerce listings.
Is there a rate limit I should be aware of?
Is there a rate limit I should be aware of?
ZenRows has plan-based concurrency limits documented on the Concurrency page. When using function calling or agents that may issue many parallel scrapes, monitor your usage on the Analytics page and consider adding retry logic with exponential backoff in your
scrape_website wrapper.