> ## Documentation Index
> Fetch the complete documentation index at: https://docs.zenrows.com/llms.txt
> Use this file to discover all available pages before exploring further.

# How to Integrate OpenAI with ZenRows

> Give OpenAI's GPT models real-time access to any website using ZenRows' Universal Scraper API and ZenRows' MCP.

Combine OpenAI's GPT models with ZenRows' [Universal Scraper API](https://docs.zenrows.com/universal-scraper-api/api-reference) to build AI workflows that can read any website in real time. This guide shows how to use ZenRows with OpenAI's Responses API, function calling, structured outputs, and the hosted MCP tool, so your AI applications can ground their answers in live web content, even from anti-bot-protected and JavaScript-heavy sites.

This guide covers the standard OpenAI API. If you are building autonomous, multi-step agents with the OpenAI Agents SDK, see the [OpenAI Agents SDK integration guide](https://docs.zenrows.com/integrations/mcp/openai-agents-sdk).

<Note>
  **Just want to plug GPT into the live web? Use ZenRows MCP with OpenAI.**

  The fastest path to a web-aware GPT is the [ZenRows hosted MCP server](https://docs.zenrows.com/integrations/mcp/mcp-overview).

  The Responses API natively supports remote MCP servers, so you can register `https://mcp.zenrows.com/mcp` as a tool and let GPT invoke ZenRows' scraping capabilities directly. No function-calling boilerplate, no orchestration code. Jump to [Using ZenRows MCP with the Responses API](#using-zenrows-mcp-with-the-responses-api) for a complete code example.
</Note>

## What is OpenAI?

OpenAI is the company behind GPT, the family of large language models powering ChatGPT and a developer platform used by hundreds of thousands of teams. The OpenAI API provides programmatic access to these models through the <a href="https://developers.openai.com/api/docs/guides/text" target="_blank" rel="noopener noreferrer"> Responses API</a>, the recommended interface for all new projects, with built-in support for function calling, structured outputs, reasoning models, and remote MCP servers.

GPT models excel at reasoning, summarization, and structured generation, but **they get blocked while accessing the web** on their own, especially at scale. **Pairing OpenAI with ZenRows closes that gap.**

### Key benefits of integrating OpenAI with ZenRows

* **Real-time web grounding for any GPT model**: GPT has a fixed knowledge cutoff. ZenRows feeds live, up-to-date web content into any response, eliminating stale or hallucinated answers about recent events, prices, or product details.
* **Anti-bot bypass out of the box**: [Adaptive Stealth Mode](https://docs.zenrows.com/universal-scraper-api/features/adaptive-stealth-mode) (`mode=auto`) automatically handles JavaScript rendering, premium proxies, fingerprinting, and bot detection, so your AI can read pages that would block a regular scraper.
* **Token-efficient Markdown output**: ZenRows returns clean Markdown in addition to raw HTML, which reduces token usage and improves model accuracy on the same context window.
* **Native function calling**: Expose ZenRows as a function tool and let GPT decide when to scrape based on the user's question, with no orchestration code required.
* **Works with every OpenAI API surface**: Use ZenRows with the Responses API, structured outputs (`responses.parse`), reasoning models, and the hosted MCP tool. The integration is a plain HTTP call, so it fits into any pattern.
* **MCP-ready out of the box**: Plug the [ZenRows hosted MCP server](https://docs.zenrows.com/integrations/mcp/mcp-overview) directly into the Responses API as a tool for zero-code access to all of ZenRows' scraping capabilities.

## Use cases

The OpenAI and ZenRows combination unlocks a wide range of AI workflows:

* **Web-aware chatbots**: Build assistants that can answer questions about any URL the user provides, including protected sites like e-commerce stores, real estate portals, and news outlets.
* **Real-time competitive intelligence**: Have GPT analyze competitor pricing pages, product launches, and changelogs as they happen.
* **Lead enrichment**: Scrape company websites and let GPT extract industry, headcount signals, tech stack hints, and product summaries into your CRM.
* **Automated research and reporting**: Pull and summarize industry articles, financial filings, or technical documentation into ready-to-share briefs.
* **Structured data extraction**: Pull strongly-typed JSON (products, jobs, listings, reviews) from any page using GPT's structured outputs grounded in ZenRows-scraped content.

## Getting started: Basic Usage

Let's start with a simple example: scrape a JavaScript-heavy, anti-bot-protected demo page using ZenRows and summarize it with `gpt-5-mini` through the Responses API.

<Steps>
  <Step title="Install the OpenAI Python library and requests">
    ```bash theme={null}
    pip install openai requests
    ```
  </Step>

  <Step title="Create a .env file and set your API keys as environment variables">
    ```bash theme={null}
    ZENROWS_API_KEY=your_zenrows_key
    OPENAI_API_KEY=your_openai_key
    ```
  </Step>

  <Step title="Run the following script">
    ```python Python theme={null}
    import os
    import requests
    from openai import OpenAI

    ZENROWS_API_KEY = os.environ["ZENROWS_API_KEY"]
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

    def scrape(url: str) -> str:
        """Scrape any URL with ZenRows and return clean Markdown."""
        response = requests.get(
            "https://api.zenrows.com/v1/",
            params={
                "apikey": ZENROWS_API_KEY,
                "url": url,
                "mode": "auto",
                "response_type": "markdown",
            },
        )
        response.raise_for_status()
        return response.text

    # Scrape a JS-rendered, anti-bot-protected page
    markdown = scrape("https://www.scrapingcourse.com/antibot-challenge")

    # Summarize with GPT through the Responses API
    response = client.responses.create(
        model="gpt-5-mini",
        instructions="You are a concise technical writer.",
        input=f"Summarize this page in 2 sentences:\n\n{markdown}",
    )

    print(response.output_text)

    ```
  </Step>
</Steps>

Two things are happening here:

1. ZenRows handles the scrape. `mode=auto` tells the API to start with the cheapest viable configuration and automatically escalate to JavaScript rendering or premium proxies if the target site requires it. `response_type=markdown` returns clean Markdown instead of raw HTML, which is ideal for LLM context.
2. OpenAI handles the reasoning. The Markdown is fed into `client.responses.create()`, with system-level guidance passed through the `instructions` parameter and the user prompt passed through `input`.

You've scraped a protected page and grounded a GPT response in live web content. Let's look at more advanced patterns.

## Advanced Usage: Building a web-aware AI assistant with function calling

Function calling is OpenAI's mechanism for letting a model decide when to call an external tool. Instead of always scraping before each prompt, you expose ZenRows as a function and let GPT call it only when needed.

This pattern is the foundation of any production-grade AI application that needs web access.

<Steps>
  <Step title="Set up the environment">
    Run the following command in your terminal to install the OpenAI Python library and requests:

    ```bash theme={null}
    pip install openai requests
    ```

    Create a `.env` file:

    ```bash theme={null}
    ZENROWS_API_KEY=your_zenrows_key
    OPENAI_API_KEY=your_openai_key
    ```
  </Step>

  <Step title="Define the scraping tool">
    Wrap the ZenRows REST call in a Python function and describe it to the Responses API using a JSON schema:

    ```python Python theme={null}
    import json
    import os
    import requests
    from openai import OpenAI

    ZENROWS_API_KEY = os.environ["ZENROWS_API_KEY"]
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

    def scrape_website(url: str) -> str:
        """Scrape any website with ZenRows and return Markdown content."""
        response = requests.get(
            "https://api.zenrows.com/v1/",
            params={
                "apikey": ZENROWS_API_KEY,
                "url": url,
                "mode": "auto",
                "response_type": "markdown",
            },
        )
        response.raise_for_status()
        return response.text

    # Tell GPT what scrape_website does and how to call it
    tools = [
        {
            "type": "function",
            "name": "scrape_website",
            "description": (
                "Scrape any public website and return its content as Markdown. "
                "Use this whenever you need up-to-date information from a specific URL."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "url": {
                        "type": "string",
                        "description": "The fully qualified URL of the website to scrape",
                    }
                },
                "required": ["url"],
                "additionalProperties": False,
            },
            "strict": True,
        }
    ]
    ```

    The Responses API uses a flat tool definition (`name`, `description`, and `parameters` at the top level of the tool object).
  </Step>

  <Step title="Let GPT decide when to scrape">
    Send the user's question along with the tool definition. GPT returns either a direct answer or a `function_call` item in the response output:

    ```python Python theme={null}
    input_list = [
        {
            "role": "user",
            "content": "What is Hacker News? Visit https://news.ycombinator.com/ and tell me what the product does.",
        }
    ]

    response = client.responses.create(
        model="gpt-5-mini",
        tools=tools,
        input=input_list,
    )

    # Capture the entire response output (function calls, reasoning items, etc.)
    input_list += response.output

    # Run the tool for any function_call item the model produced
    for item in response.output:
        if item.type == "function_call" and item.name == "scrape_website":
            args = json.loads(item.arguments)
            result = scrape_website(args["url"])
            input_list.append({
                "type": "function_call_output",
                "call_id": item.call_id,
                "output": result,
            })

    # Send the tool result back to the model for the final answer
    final = client.responses.create(
        model="gpt-5-mini",
        tools=tools,
        input=input_list,
    )

    print(final.output_text)
    ```

    <Note>
      When using reasoning models like ChatGPT-5, the response output may include reasoning items alongside function calls. The line `input_list += response.output` captures all output items, including reasoning, which must be passed back with tool call outputs for the model to continue correctly.
    </Note>
  </Step>

  <Step title="Complete Code Example and Output">
    ```python Python theme={null}
    import json
    import os
    import requests
    from openai import OpenAI

    ZENROWS_API_KEY = os.environ["ZENROWS_API_KEY"]
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

    def scrape_website(url: str) -> str:
        response = requests.get(
            "https://api.zenrows.com/v1/",
            params={
                "apikey": ZENROWS_API_KEY,
                "url": url,
                "mode": "auto",
                "response_type": "markdown",
            },
        )
        response.raise_for_status()
        return response.text

    tools = [
        {
            "type": "function",
            "name": "scrape_website",
            "description": (
                "Scrape any public website and return its content as Markdown. "
                "Use this whenever you need up-to-date information from a specific URL."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "url": {
                        "type": "string",
                        "description": "The fully qualified URL of the website to scrape",
                    }
                },
                "required": ["url"],
                "additionalProperties": False,
            },
            "strict": True,
        }
    ]

    def ask(question: str) -> str:
        input_list = [{"role": "user", "content": question}]

        response = client.responses.create(
            model="gpt-5-mini",
            tools=tools,
            input=input_list,
        )

        input_list += response.output
        has_tool_call = False

        for item in response.output:
            if item.type == "function_call" and item.name == "scrape_website":
                has_tool_call = True
                args = json.loads(item.arguments)
                result = scrape_website(args["url"])
                input_list.append({
                    "type": "function_call_output",
                    "call_id": item.call_id,
                    "output": result,
                })

        if not has_tool_call:
            return response.output_text

        final = client.responses.create(
            model="gpt-5-mini",
            tools=tools,
            input=input_list,
        )

        return final.output_text

    print(ask("What is Hacker News? Visit https://news.ycombinator.com/ and tell me what the product does."))
    ```

    **Example output**

    ```wrap Output theme={null}
    Hacker News is a minimalist social-news and discussion site run by Y Combinator that surfaces tech-, startup-, and hacker-oriented links and posts. It’s essentially a community-curated news aggregator + forum where users submit items, vote them up or down, and discuss them in threaded comments.

    What the product does (quick summary)
    - Aggregates links and text posts about programming, startups, science, and related topics.
    - Lets registered users submit stories, vote (upvote) items, and comment; votes and recency determine ranking on the front page.
    - Provides named formats like "Show HN" (projects/demos) and "Ask HN" (questions) that signal post type to the community.
    - Offers sections/filters: new, past/front, comments, ask, show, jobs, and submit.
    - ...
    ```
  </Step>
</Steps>

GPT decides on its own that the question requires fresh information from `news.ycombinator.com`, calls `scrape_website` with the right URL, then synthesizes a grounded answer from the scraped Markdown.

## Structured data extraction

OpenAI's <a href="https://developers.openai.com/api/docs/guides/structured-outputs" target="_blank" rel="noopener noreferrer">structured outputs</a> let you guarantee a model returns JSON that matches a specific schema. Combined with ZenRows, this is the cleanest way to extract typed data from any web page.

The Responses API exposes structured outputs through `client.responses.parse()`, which takes a Pydantic model directly and returns a parsed Python object.

The example below scrapes a company homepage and extracts a strongly-typed `CompanyInfo` object:

```python Python theme={null}
import os
import requests
from openai import OpenAI
from pydantic import BaseModel

ZENROWS_API_KEY = os.environ["ZENROWS_API_KEY"]
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

class CompanyInfo(BaseModel):
    name: str
    industry: str
    description: str

# Scrape the company website
response = requests.get(
    "https://api.zenrows.com/v1/",
    params={
        "apikey": ZENROWS_API_KEY,
        "url": "https://www.zillow.com/",
        "mode": "auto",
        "response_type": "markdown",
    },
)
response.raise_for_status()
markdown = response.text

# Extract structured data through the Responses API
parsed = client.responses.parse(
    model="gpt-5-mini",
    instructions="Extract company information from the provided website content.",
    input=markdown,
    text_format=CompanyInfo,
)

company = parsed.output_parsed
print(company.model_dump_json(indent=2))
```

### Example output

```json theme={null}
{
  "name": "Zillow (Zillow Group)",
  "industry": "Real estate / Online real estate marketplace",
  "description": "Zillow is an online real estate marketplace and services company offering home listings for sale and rent, rental-management and agent-finding tools, home value estimates (Zestimates), mortgage products through Zillow Home Loans, market research, news and mobile apps."
}
```

The same pattern works for any structured extraction job: product listings, job postings, real estate, reviews, articles, and so on. Define the Pydantic schema, scrape the page with `mode=auto`, and let GPT return validated JSON.

<Note>
  **Skip the schema with ZenRows Autoparse**

  For common site types, you can skip LLM-based parsing entirely. Adding `autoparse=true` to your ZenRows request automatically identifies and extracts product details, article content, job listings, property data, and similar information into clean JSON, with no Pydantic schema or model call required.

  [ZenRows Autoparse](https://docs.zenrows.com/universal-scraper-api/features/autoparse) is included at no additional cost. Return the JSON directly to your application for known-structure pages, or pass it to GPT for downstream enrichment, normalization, or categorization. Pre-structured input also uses far fewer tokens than raw Markdown, which keeps your context window lean and your model calls cheaper.

  Note that Autoparse cannot be combined with `response_type=markdown` in the same request.
</Note>

## Using ZenRows MCP with the Responses API

The Responses API natively supports remote [Model Context Protocol (MCP)](https://docs.zenrows.com/integrations/mcp/mcp-overview) servers as tools. ZenRows publishes a hosted MCP server that exposes web scraping capabilities, so you can give GPT real-time web access without writing any function-calling boilerplate.

```python theme={null}
import os
from openai import OpenAI

ZENROWS_API_KEY = os.environ["ZENROWS_API_KEY"]
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.responses.create(
    model="gpt-5",
    tools=[
        {
            "type": "mcp",
            "server_label": "zenrows",
            "server_description": "Web scraping MCP server for accessing live web content.",
            "server_url": "https://mcp.zenrows.com/mcp",
            "authorization": ZENROWS_API_KEY,
            "require_approval": "never",
        }
    ],
    input="Visit https://news.ycombinator.com/ and summarize the three most recent posts.",
)

print(response.output_text)
```

A few notes on this configuration:

* `server_url` points at ZenRows' hosted MCP endpoint.
* `authorization` carries your ZenRows API key. OpenAI does not retain the value between requests, so it must be present on every API call.
* `require_approval="never"` skips the per-tool-call approval step, which is appropriate for trusted servers like ZenRows. To require explicit approval, remove this field or set it to `"always"`.

For details on the ZenRows MCP server and its capabilities, see the [MCP overview](https://docs.zenrows.com/integrations/mcp/mcp-overview).

## API reference

The most useful Universal Scraper API parameters when working with OpenAI:

| Parameter                            | Type    | Description                                                                                                                                                                                                                                                                                                                                                                                                          |
| ------------------------------------ | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `apikey`                             | string  | **Required.** Your ZenRows API key, passed as a query parameter.                                                                                                                                                                                                                                                                                                                                                     |
| `url`                                | string  | **Required.** The URL to scrape.                                                                                                                                                                                                                                                                                                                                                                                     |
| `mode`                               | string  | Set to `auto` to enable [Adaptive Stealth Mode](https://docs.zenrows.com/universal-scraper-api/features/adaptive-stealth-mode), which automatically picks the cheapest working configuration for each request. Recommended default for AI workflows.                                                                                                                                                                 |
| `response_type`                      | string  | Convert HTML to other formats. Use `markdown` for LLM context (recommended), or `plaintext`, `pdf`.                                                                                                                                                                                                                                                                                                                  |
| `js_render`                          | boolean | Enable JavaScript rendering with a headless browser. Set automatically by `mode=auto` when needed.                                                                                                                                                                                                                                                                                                                   |
| `js_instructions`                    | string  | Execute custom JavaScript on the page (click, scroll, fill forms) before returning content.                                                                                                                                                                                                                                                                                                                          |
| `premium_proxy`                      | boolean | Use residential IPs to bypass anti-bot protection. Set automatically by `mode=auto` when needed.                                                                                                                                                                                                                                                                                                                     |
| `proxy_country`                      | string  | Two-letter country code for geo-restricted content (e.g. `us`, `gb`, `de`). Requires premium proxies, which `mode=auto` enables automatically when needed.                                                                                                                                                                                                                                                           |
| `session_id`                         | integer | Maintain the same IP across multiple requests for up to 10 minutes.                                                                                                                                                                                                                                                                                                                                                  |
| `wait_for`                           | string  | Wait for a specific CSS selector to appear in the DOM before returning content.                                                                                                                                                                                                                                                                                                                                      |
| `wait`                               | integer | Wait a fixed number of milliseconds after page load.                                                                                                                                                                                                                                                                                                                                                                 |
| `css_extractor`                      | string  | Extract specific elements using CSS selectors (JSON format).                                                                                                                                                                                                                                                                                                                                                         |
| `outputs`                            | string  | Extract specific data types as structured JSON using a comma-separated list of filters: `emails`, `phone_numbers`, `headings`, `images`, `audios`, `videos`, `links`, `menus`, `hashtags`, `metadata`, `tables`, `favicon`. Use `outputs=*` to retrieve all available data types. See the [Output Filters documentation](https://docs.zenrows.com/universal-scraper-api/features/output#output-filters) for details. |
| `screenshot` / `screenshot_fullpage` | boolean | Capture a screenshot of the page. Useful for multimodal GPT inputs.                                                                                                                                                                                                                                                                                                                                                  |

For the full parameter list, see the [Universal Scraper API reference](https://docs.zenrows.com/universal-scraper-api/api-reference#parameter-overview).

## Troubleshooting

### Token limit exceeded

* **Option 1**: Use `response_type=markdown` (already shown in every example above). Markdown reduces token usage significantly compared to raw HTML.
* **Option 2**: Use the `css_extractor` or `outputs` parameter to scrape only the parts of the page you need (a product card, a pricing table, an article body) instead of the entire DOM.
* **Option 3**: Chunk the scraped content into smaller pieces, summarize each chunk separately, and combine the results, or switch to a model with a larger context window.

### API key errors

* **Option 1**: Confirm both `ZENROWS_API_KEY` and `OPENAI_API_KEY` are set in your environment.
* **Option 2**: Verify your ZenRows API key in the [dashboard](https://app.zenrows.com/account/settings) and your OpenAI key in the <a href="https://platform.openai.com/api-keys" target="_blank" rel="noopener noreferrer">OpenAI platform</a>.
* **Option 3**: Check that your ZenRows subscription is active and has remaining quota on the [Analytics page](https://app.zenrows.com/analytics/scraper-api).

### Empty or incomplete tool responses

* **Option 1**: Confirm `mode=auto` is set. Without it, JavaScript rendering and premium proxies are off by default and protected sites return blocked or empty pages.
* **Option 2**: For sites that load content asynchronously, add `wait_for=<css-selector>` (waits for a specific element) or `wait=5000` (waits a fixed duration in milliseconds).
* **Option 3**: If the model is calling the tool with a malformed URL, tighten the function description in the JSON schema or add validation in your `scrape_website` wrapper before sending the request to ZenRows.

### Reasoning items missing after tool calls

When using reasoning models like `gpt-5`, the response output may include reasoning items alongside function calls. Always append the entire `response.output` back to your input list (not just function calls) before sending the tool result. The pattern `input_list += response.output` shown in the function calling example handles this automatically.

## Helpful resources

* [ZenRows Universal Scraper API reference](https://docs.zenrows.com/universal-scraper-api/api-reference)
* [ZenRows MCP server documentation](https://docs.zenrows.com/integrations/mcp/mcp-overview)
* [Adaptive Stealth Mode documentation](https://docs.zenrows.com/universal-scraper-api/features/adaptive-stealth-mode)
* [ZenRows + OpenAI Agents SDK integration](https://docs.zenrows.com/integrations/mcp/openai-agents-sdk)
* [OpenAI Responses API guide](https://developers.openai.com/api/docs/guides/text)
* [OpenAI function calling guide](https://developers.openai.com/api/docs/guides/function-calling)
* [OpenAI structured outputs guide](https://developers.openai.com/api/docs/guides/structured-outputs)

## Frequently asked questions

<Accordion title="Which OpenAI models work with this integration?">
  All current models. Function calling and structured outputs work best with `gpt-4o`, `gpt-4o-mini`, and other recent models. The hosted MCP tool requires a model that supports remote tools on the Responses API.
</Accordion>

<Accordion title="How do I handle pages that exceed the model's token limit?">
  Three options, in order of preference: use `response_type=markdown` to keep token usage low; use `css_extractor` or `outputs` to scrape only the relevant section of the page; chunk the scraped content and summarize each chunk before combining, or switch to a model with a larger context window.
</Accordion>

<Accordion title="Does ZenRows handle JavaScript-heavy and anti-bot-protected sites automatically?">
  Yes. Setting `mode=auto` enables Adaptive Stealth Mode, which automatically activates JavaScript rendering, premium residential proxies, and stealth fingerprinting only when the target site requires them. You only pay for what succeeds.
</Accordion>

<Accordion title="Can I use this with the OpenAI Agents SDK?">
  Yes. The Agents SDK is built on top of the Responses API, so the same patterns apply. For agent-specific workflows including multi-step research, handoffs, and the Agents SDK's built-in MCP support, see the [OpenAI Agents SDK integration guide](https://docs.zenrows.com/integrations/mcp/openai-agents-sdk).
</Accordion>

<Accordion title="Can I let GPT browse multiple pages autonomously?">
  Yes. Function calling supports multi-turn tool use, so GPT can call `scrape_website` repeatedly with different URLs, read what it gets back, and decide what to scrape next. For more complex agent behavior, pair this with the [hosted MCP tool](https://docs.zenrows.com/integrations/mcp/mcp-overview), which exposes a richer scraping surface without requiring you to maintain the function-calling loop yourself. For autonomous, multi-step research workflows, see the [OpenAI Agents SDK integration guide](https://docs.zenrows.com/integrations/mcp/openai-agents-sdk).
</Accordion>

<Accordion title="How does this compare to OpenAI's built-in web search tool?">
  OpenAI's built-in web search is a black box: you don't choose the source, control the depth, or get raw page content. ZenRows gives you full control over which URL is fetched, how the page is rendered, what format the response comes back in, and which countries the request is routed through. It also unlocks pages that public search engines don't surface or that block scrapers, such as logged-in views, region-restricted content, and deep e-commerce listings.
</Accordion>

<Accordion title="Is there a rate limit I should be aware of?">
  ZenRows has plan-based concurrency limits documented on the [Concurrency page](https://docs.zenrows.com/universal-scraper-api/features/concurrency). When using function calling or agents that may issue many parallel scrapes, monitor your usage on the Analytics page and consider adding retry logic with exponential backoff in your `scrape_website` wrapper.
</Accordion>
