Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.zenrows.com/llms.txt

Use this file to discover all available pages before exploring further.

Combine OpenAI’s GPT models with ZenRows’ Universal Scraper API to build AI workflows that can read any website in real time. This guide shows how to use ZenRows with OpenAI’s Responses API, function calling, structured outputs, and the hosted MCP tool, so your AI applications can ground their answers in live web content, even from anti-bot-protected and JavaScript-heavy sites. This guide covers the standard OpenAI API. If you are building autonomous, multi-step agents with the OpenAI Agents SDK, see the OpenAI Agents SDK integration guide.
Just want to plug GPT into the live web? Use ZenRows MCP with OpenAI.The fastest path to a web-aware GPT is the ZenRows hosted MCP server.The Responses API natively supports remote MCP servers, so you can register https://mcp.zenrows.com/mcp as a tool and let GPT invoke ZenRows’ scraping capabilities directly. No function-calling boilerplate, no orchestration code. Jump to Using ZenRows MCP with the Responses API for a complete code example.

What is OpenAI?

OpenAI is the company behind GPT, the family of large language models powering ChatGPT and a developer platform used by hundreds of thousands of teams. The OpenAI API provides programmatic access to these models through the Responses API, the recommended interface for all new projects, with built-in support for function calling, structured outputs, reasoning models, and remote MCP servers. GPT models excel at reasoning, summarization, and structured generation, but they get blocked while accessing the web on their own, especially at scale. Pairing OpenAI with ZenRows closes that gap.

Key benefits of integrating OpenAI with ZenRows

  • Real-time web grounding for any GPT model: GPT has a fixed knowledge cutoff. ZenRows feeds live, up-to-date web content into any response, eliminating stale or hallucinated answers about recent events, prices, or product details.
  • Anti-bot bypass out of the box: Adaptive Stealth Mode (mode=auto) automatically handles JavaScript rendering, premium proxies, fingerprinting, and bot detection, so your AI can read pages that would block a regular scraper.
  • Token-efficient Markdown output: ZenRows returns clean Markdown in addition to raw HTML, which reduces token usage and improves model accuracy on the same context window.
  • Native function calling: Expose ZenRows as a function tool and let GPT decide when to scrape based on the user’s question, with no orchestration code required.
  • Works with every OpenAI API surface: Use ZenRows with the Responses API, structured outputs (responses.parse), reasoning models, and the hosted MCP tool. The integration is a plain HTTP call, so it fits into any pattern.
  • MCP-ready out of the box: Plug the ZenRows hosted MCP server directly into the Responses API as a tool for zero-code access to all of ZenRows’ scraping capabilities.

Use cases

The OpenAI and ZenRows combination unlocks a wide range of AI workflows:
  • Web-aware chatbots: Build assistants that can answer questions about any URL the user provides, including protected sites like e-commerce stores, real estate portals, and news outlets.
  • Real-time competitive intelligence: Have GPT analyze competitor pricing pages, product launches, and changelogs as they happen.
  • Lead enrichment: Scrape company websites and let GPT extract industry, headcount signals, tech stack hints, and product summaries into your CRM.
  • Automated research and reporting: Pull and summarize industry articles, financial filings, or technical documentation into ready-to-share briefs.
  • Structured data extraction: Pull strongly-typed JSON (products, jobs, listings, reviews) from any page using GPT’s structured outputs grounded in ZenRows-scraped content.

Getting started: Basic Usage

Let’s start with a simple example: scrape a JavaScript-heavy, anti-bot-protected demo page using ZenRows and summarize it with gpt-5-mini through the Responses API.
1

Install the OpenAI Python library and requests

pip install openai requests
2

Create a .env file and set your API keys as environment variables

ZENROWS_API_KEY=your_zenrows_key
OPENAI_API_KEY=your_openai_key
3

Run the following script

Python
import os
import requests
from openai import OpenAI

ZENROWS_API_KEY = os.environ["ZENROWS_API_KEY"]
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def scrape(url: str) -> str:
    """Scrape any URL with ZenRows and return clean Markdown."""
    response = requests.get(
        "https://api.zenrows.com/v1/",
        params={
            "apikey": ZENROWS_API_KEY,
            "url": url,
            "mode": "auto",
            "response_type": "markdown",
        },
    )
    response.raise_for_status()
    return response.text

# Scrape a JS-rendered, anti-bot-protected page
markdown = scrape("https://www.scrapingcourse.com/antibot-challenge")

# Summarize with GPT through the Responses API
response = client.responses.create(
    model="gpt-5-mini",
    instructions="You are a concise technical writer.",
    input=f"Summarize this page in 2 sentences:\n\n{markdown}",
)

print(response.output_text)

Two things are happening here:
  1. ZenRows handles the scrape. mode=auto tells the API to start with the cheapest viable configuration and automatically escalate to JavaScript rendering or premium proxies if the target site requires it. response_type=markdown returns clean Markdown instead of raw HTML, which is ideal for LLM context.
  2. OpenAI handles the reasoning. The Markdown is fed into client.responses.create(), with system-level guidance passed through the instructions parameter and the user prompt passed through input.
You’ve scraped a protected page and grounded a GPT response in live web content. Let’s look at more advanced patterns.

Advanced Usage: Building a web-aware AI assistant with function calling

Function calling is OpenAI’s mechanism for letting a model decide when to call an external tool. Instead of always scraping before each prompt, you expose ZenRows as a function and let GPT call it only when needed. This pattern is the foundation of any production-grade AI application that needs web access.
1

Set up the environment

Run the following command in your terminal to install the OpenAI Python library and requests:
pip install openai requests
Create a .env file:
ZENROWS_API_KEY=your_zenrows_key
OPENAI_API_KEY=your_openai_key
2

Define the scraping tool

Wrap the ZenRows REST call in a Python function and describe it to the Responses API using a JSON schema:
Python
import json
import os
import requests
from openai import OpenAI

ZENROWS_API_KEY = os.environ["ZENROWS_API_KEY"]
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def scrape_website(url: str) -> str:
    """Scrape any website with ZenRows and return Markdown content."""
    response = requests.get(
        "https://api.zenrows.com/v1/",
        params={
            "apikey": ZENROWS_API_KEY,
            "url": url,
            "mode": "auto",
            "response_type": "markdown",
        },
    )
    response.raise_for_status()
    return response.text

# Tell GPT what scrape_website does and how to call it
tools = [
    {
        "type": "function",
        "name": "scrape_website",
        "description": (
            "Scrape any public website and return its content as Markdown. "
            "Use this whenever you need up-to-date information from a specific URL."
        ),
        "parameters": {
            "type": "object",
            "properties": {
                "url": {
                    "type": "string",
                    "description": "The fully qualified URL of the website to scrape",
                }
            },
            "required": ["url"],
            "additionalProperties": False,
        },
        "strict": True,
    }
]
The Responses API uses a flat tool definition (name, description, and parameters at the top level of the tool object).
3

Let GPT decide when to scrape

Send the user’s question along with the tool definition. GPT returns either a direct answer or a function_call item in the response output:
Python
input_list = [
    {
        "role": "user",
        "content": "What is Hacker News? Visit https://news.ycombinator.com/ and tell me what the product does.",
    }
]

response = client.responses.create(
    model="gpt-5-mini",
    tools=tools,
    input=input_list,
)

# Capture the entire response output (function calls, reasoning items, etc.)
input_list += response.output

# Run the tool for any function_call item the model produced
for item in response.output:
    if item.type == "function_call" and item.name == "scrape_website":
        args = json.loads(item.arguments)
        result = scrape_website(args["url"])
        input_list.append({
            "type": "function_call_output",
            "call_id": item.call_id,
            "output": result,
        })

# Send the tool result back to the model for the final answer
final = client.responses.create(
    model="gpt-5-mini",
    tools=tools,
    input=input_list,
)

print(final.output_text)
When using reasoning models like ChatGPT-5, the response output may include reasoning items alongside function calls. The line input_list += response.output captures all output items, including reasoning, which must be passed back with tool call outputs for the model to continue correctly.
4

Complete Code Example and Output

Python
import json
import os
import requests
from openai import OpenAI

ZENROWS_API_KEY = os.environ["ZENROWS_API_KEY"]
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def scrape_website(url: str) -> str:
    response = requests.get(
        "https://api.zenrows.com/v1/",
        params={
            "apikey": ZENROWS_API_KEY,
            "url": url,
            "mode": "auto",
            "response_type": "markdown",
        },
    )
    response.raise_for_status()
    return response.text

tools = [
    {
        "type": "function",
        "name": "scrape_website",
        "description": (
            "Scrape any public website and return its content as Markdown. "
            "Use this whenever you need up-to-date information from a specific URL."
        ),
        "parameters": {
            "type": "object",
            "properties": {
                "url": {
                    "type": "string",
                    "description": "The fully qualified URL of the website to scrape",
                }
            },
            "required": ["url"],
            "additionalProperties": False,
        },
        "strict": True,
    }
]

def ask(question: str) -> str:
    input_list = [{"role": "user", "content": question}]

    response = client.responses.create(
        model="gpt-5-mini",
        tools=tools,
        input=input_list,
    )

    input_list += response.output
    has_tool_call = False

    for item in response.output:
        if item.type == "function_call" and item.name == "scrape_website":
            has_tool_call = True
            args = json.loads(item.arguments)
            result = scrape_website(args["url"])
            input_list.append({
                "type": "function_call_output",
                "call_id": item.call_id,
                "output": result,
            })

    if not has_tool_call:
        return response.output_text

    final = client.responses.create(
        model="gpt-5-mini",
        tools=tools,
        input=input_list,
    )

    return final.output_text

print(ask("What is Hacker News? Visit https://news.ycombinator.com/ and tell me what the product does."))
Example output
Output
Hacker News is a minimalist social-news and discussion site run by Y Combinator that surfaces tech-, startup-, and hacker-oriented links and posts. It’s essentially a community-curated news aggregator + forum where users submit items, vote them up or down, and discuss them in threaded comments.

What the product does (quick summary)
- Aggregates links and text posts about programming, startups, science, and related topics.
- Lets registered users submit stories, vote (upvote) items, and comment; votes and recency determine ranking on the front page.
- Provides named formats like "Show HN" (projects/demos) and "Ask HN" (questions) that signal post type to the community.
- Offers sections/filters: new, past/front, comments, ask, show, jobs, and submit.
- ...
GPT decides on its own that the question requires fresh information from news.ycombinator.com, calls scrape_website with the right URL, then synthesizes a grounded answer from the scraped Markdown.

Structured data extraction

OpenAI’s structured outputs let you guarantee a model returns JSON that matches a specific schema. Combined with ZenRows, this is the cleanest way to extract typed data from any web page. The Responses API exposes structured outputs through client.responses.parse(), which takes a Pydantic model directly and returns a parsed Python object. The example below scrapes a company homepage and extracts a strongly-typed CompanyInfo object:
Python
import os
import requests
from openai import OpenAI
from pydantic import BaseModel

ZENROWS_API_KEY = os.environ["ZENROWS_API_KEY"]
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

class CompanyInfo(BaseModel):
    name: str
    industry: str
    description: str

# Scrape the company website
response = requests.get(
    "https://api.zenrows.com/v1/",
    params={
        "apikey": ZENROWS_API_KEY,
        "url": "https://www.zillow.com/",
        "mode": "auto",
        "response_type": "markdown",
    },
)
response.raise_for_status()
markdown = response.text

# Extract structured data through the Responses API
parsed = client.responses.parse(
    model="gpt-5-mini",
    instructions="Extract company information from the provided website content.",
    input=markdown,
    text_format=CompanyInfo,
)

company = parsed.output_parsed
print(company.model_dump_json(indent=2))

Example output

{
  "name": "Zillow (Zillow Group)",
  "industry": "Real estate / Online real estate marketplace",
  "description": "Zillow is an online real estate marketplace and services company offering home listings for sale and rent, rental-management and agent-finding tools, home value estimates (Zestimates), mortgage products through Zillow Home Loans, market research, news and mobile apps."
}
The same pattern works for any structured extraction job: product listings, job postings, real estate, reviews, articles, and so on. Define the Pydantic schema, scrape the page with mode=auto, and let GPT return validated JSON.
Skip the schema with ZenRows AutoparseFor common site types, you can skip LLM-based parsing entirely. Adding autoparse=true to your ZenRows request automatically identifies and extracts product details, article content, job listings, property data, and similar information into clean JSON, with no Pydantic schema or model call required.ZenRows Autoparse is included at no additional cost. Return the JSON directly to your application for known-structure pages, or pass it to GPT for downstream enrichment, normalization, or categorization. Pre-structured input also uses far fewer tokens than raw Markdown, which keeps your context window lean and your model calls cheaper.Note that Autoparse cannot be combined with response_type=markdown in the same request.

Using ZenRows MCP with the Responses API

The Responses API natively supports remote Model Context Protocol (MCP) servers as tools. ZenRows publishes a hosted MCP server that exposes web scraping capabilities, so you can give GPT real-time web access without writing any function-calling boilerplate.
import os
from openai import OpenAI

ZENROWS_API_KEY = os.environ["ZENROWS_API_KEY"]
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.responses.create(
    model="gpt-5",
    tools=[
        {
            "type": "mcp",
            "server_label": "zenrows",
            "server_description": "Web scraping MCP server for accessing live web content.",
            "server_url": "https://mcp.zenrows.com/mcp",
            "authorization": ZENROWS_API_KEY,
            "require_approval": "never",
        }
    ],
    input="Visit https://news.ycombinator.com/ and summarize the three most recent posts.",
)

print(response.output_text)
A few notes on this configuration:
  • server_url points at ZenRows’ hosted MCP endpoint.
  • authorization carries your ZenRows API key. OpenAI does not retain the value between requests, so it must be present on every API call.
  • require_approval="never" skips the per-tool-call approval step, which is appropriate for trusted servers like ZenRows. To require explicit approval, remove this field or set it to "always".
For details on the ZenRows MCP server and its capabilities, see the MCP overview.

API reference

The most useful Universal Scraper API parameters when working with OpenAI:
ParameterTypeDescription
apikeystringRequired. Your ZenRows API key, passed as a query parameter.
urlstringRequired. The URL to scrape.
modestringSet to auto to enable Adaptive Stealth Mode, which automatically picks the cheapest working configuration for each request. Recommended default for AI workflows.
response_typestringConvert HTML to other formats. Use markdown for LLM context (recommended), or plaintext, pdf.
js_renderbooleanEnable JavaScript rendering with a headless browser. Set automatically by mode=auto when needed.
js_instructionsstringExecute custom JavaScript on the page (click, scroll, fill forms) before returning content.
premium_proxybooleanUse residential IPs to bypass anti-bot protection. Set automatically by mode=auto when needed.
proxy_countrystringTwo-letter country code for geo-restricted content (e.g. us, gb, de). Requires premium proxies, which mode=auto enables automatically when needed.
session_idintegerMaintain the same IP across multiple requests for up to 10 minutes.
wait_forstringWait for a specific CSS selector to appear in the DOM before returning content.
waitintegerWait a fixed number of milliseconds after page load.
css_extractorstringExtract specific elements using CSS selectors (JSON format).
outputsstringExtract specific data types as structured JSON using a comma-separated list of filters: emails, phone_numbers, headings, images, audios, videos, links, menus, hashtags, metadata, tables, favicon. Use outputs=* to retrieve all available data types. See the Output Filters documentation for details.
screenshot / screenshot_fullpagebooleanCapture a screenshot of the page. Useful for multimodal GPT inputs.
For the full parameter list, see the Universal Scraper API reference.

Troubleshooting

Token limit exceeded

  • Option 1: Use response_type=markdown (already shown in every example above). Markdown reduces token usage significantly compared to raw HTML.
  • Option 2: Use the css_extractor or outputs parameter to scrape only the parts of the page you need (a product card, a pricing table, an article body) instead of the entire DOM.
  • Option 3: Chunk the scraped content into smaller pieces, summarize each chunk separately, and combine the results, or switch to a model with a larger context window.

API key errors

  • Option 1: Confirm both ZENROWS_API_KEY and OPENAI_API_KEY are set in your environment.
  • Option 2: Verify your ZenRows API key in the dashboard and your OpenAI key in the OpenAI platform.
  • Option 3: Check that your ZenRows subscription is active and has remaining quota on the Analytics page.

Empty or incomplete tool responses

  • Option 1: Confirm mode=auto is set. Without it, JavaScript rendering and premium proxies are off by default and protected sites return blocked or empty pages.
  • Option 2: For sites that load content asynchronously, add wait_for=<css-selector> (waits for a specific element) or wait=5000 (waits a fixed duration in milliseconds).
  • Option 3: If the model is calling the tool with a malformed URL, tighten the function description in the JSON schema or add validation in your scrape_website wrapper before sending the request to ZenRows.

Reasoning items missing after tool calls

When using reasoning models like gpt-5, the response output may include reasoning items alongside function calls. Always append the entire response.output back to your input list (not just function calls) before sending the tool result. The pattern input_list += response.output shown in the function calling example handles this automatically.

Helpful resources

Frequently asked questions

All current models. Function calling and structured outputs work best with gpt-4o, gpt-4o-mini, and other recent models. The hosted MCP tool requires a model that supports remote tools on the Responses API.
Three options, in order of preference: use response_type=markdown to keep token usage low; use css_extractor or outputs to scrape only the relevant section of the page; chunk the scraped content and summarize each chunk before combining, or switch to a model with a larger context window.
Yes. Setting mode=auto enables Adaptive Stealth Mode, which automatically activates JavaScript rendering, premium residential proxies, and stealth fingerprinting only when the target site requires them. You only pay for what succeeds.
Yes. The Agents SDK is built on top of the Responses API, so the same patterns apply. For agent-specific workflows including multi-step research, handoffs, and the Agents SDK’s built-in MCP support, see the OpenAI Agents SDK integration guide.
Yes. Function calling supports multi-turn tool use, so GPT can call scrape_website repeatedly with different URLs, read what it gets back, and decide what to scrape next. For more complex agent behavior, pair this with the hosted MCP tool, which exposes a richer scraping surface without requiring you to maintain the function-calling loop yourself. For autonomous, multi-step research workflows, see the OpenAI Agents SDK integration guide.
OpenAI’s built-in web search is a black box: you don’t choose the source, control the depth, or get raw page content. ZenRows gives you full control over which URL is fetched, how the page is rendered, what format the response comes back in, and which countries the request is routed through. It also unlocks pages that public search engines don’t surface or that block scrapers, such as logged-in views, region-restricted content, and deep e-commerce listings.
ZenRows has plan-based concurrency limits documented on the Concurrency page. When using function calling or agents that may issue many parallel scrapes, monitor your usage on the Analytics page and consider adding retry logic with exponential backoff in your scrape_website wrapper.