> ## Documentation Index
> Fetch the complete documentation index at: https://docs.zenrows.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Python Requests and BeautifulSoup Integration

> How to use ZenRows with Python Requests and BeautifulSoup for reliable web scraping, including concurrency, auto-retry, and setup.

Learn how to integrate ZenRows API with Python Requests and BeautifulSoup to extract the data you want. From basic calls to advanced features such as auto-retry and concurrency. We will walk over each stage of the process, from installation to final code, explaining everything we code.

For a short version, go to the final code and copy it. It is commented with the parts that must be completed and helpful suggestions for the more challenging details.

For the code to work, you will need <a href="https://www.python.org/downloads/" target="_blank" rel="noopener noreferrer nofollow">python3 installed</a>. Some systems have it pre-installed. After that, install all the necessary libraries by running `pip install`.

```bash theme={null}
pip install requests beautifulsoup4
```

You will also need to [register to get your API Key](https://app.zenrows.com/register?p=free).

## Using Requests to Get a Page

The first library we will see is `requests`, an HTTP library for Python. It exposes a `get` method that will call a URL and return its HTML. For the time being, we won't be utilizing any parameters; this is simply a demo to see how it works.

**Careful! This script will execute without any proxy so that the server will see your actual IP.** You don't need to run this snippet.

```python theme={null}
import requests

url = "" # ... your URL here
response = requests.get(url)

print(response.text)  # pages's HTML
```

## Calling ZenRows API with Requests

Connecting requests to ZenRows API is straightforward. `get`'s target will be the API base and then two params: `apikey` for authentication and `url`. URLs must be [encoded](/universal-scraper-api/faq#how-to-encode-urls); however, `requests` will handle that when using `params`.

With this simple update, we will manage most scraping problems, such as proxy rotation, setting correct headers, avoiding CAPTCHAs and anti-bot solutions, and many more. But there are a few issues that we will address now. Keep on reading.

```python theme={null}
import requests

url = "" # ... your URL here
apikey = "YOUR_ZENROWS_API_KEY" # paste your API Key here
zenrows_api_base = "https://api.zenrows.com/v1/"

response = requests.get(zenrows_api_base, params={
	"apikey": apikey,
	"url": url,
})

print(response.text)  # pages's HTML
```

## Extracting Basic Data with BeautifulSoup

We'll now use BeautifulSoup to parse the HTML on the page and extract some data. We will write a simple function called `extract_content` that returns URL, title, and h1 content. There is where you can put your custom extracting logic.

```python theme={null}
import requests
from bs4 import BeautifulSoup

url = "" # ... your URL here
apikey = "YOUR_ZENROWS_API_KEY" # paste your API Key here
zenrows_api_base = "https://api.zenrows.com/v1/"

def extract_content(url, soup):
	# extracting logic goes here
	return {
		"url": url,
		"title": soup.title.string,
		"h1": soup.find("h1").text,
	}

response = requests.get(zenrows_api_base, params={
	"apikey": apikey,
	"url": url,
})
soup = BeautifulSoup(response.text, "html.parser")
content = extract_content(url, soup)

print(content)  # custom scraped content
```

## List of URLs with Concurrency

Up until now, we were scraping a single URL. Instead, we will now introduce a list of URLs more relevant to a real-world use case. In addition, we will set up concurrency, so we don't have to wait for the sequential process to complete. It will allow the script to process multiple URLs simultaneously, always with a maximum. That number is determined by the plan you are in.

In short, `multiprocessing` package implements a `ThreadPool` that will queue and execute all our requests. And it will do so by handling the parallelism for us and the maximum number of requests going on simultaneously, but never over the limit (10 in the example). Once all the requests finish, it will group all the results in a single variable, and we will print them. In a real case, for example, store them in a database.

Note that this is not a queue; we can add no new URLs once the process initiates. If that is your use case, check out our guide on how to [Scrape and Crawl from a Seed URL](/zenrows-academy/scrape-and-crawl-from-a-seed-url).

```python theme={null}
import requests
from bs4 import BeautifulSoup
from multiprocessing.pool import ThreadPool

apikey = "YOUR_ZENROWS_API_KEY" # paste your API Key here
zenrows_api_base = "https://api.zenrows.com/v1/"

concurrency = 10
urls = [
	# ... your URLs here
]

def extract_content(url, soup):
	# extracting logic goes here
	return {
		"url": url,
		"title": soup.title.string,
		"h1": soup.find("h1").text,
	}

def scrape_with_zenrows(url):
	response = requests.get(zenrows_api_base, params={
		"apikey": apikey,
		"url": url,
	})
	soup = BeautifulSoup(response.text, "html.parser")
	return extract_content(url, soup)

pool = ThreadPool(concurrency)
results = pool.map(scrape_with_zenrows, urls)
pool.close()
pool.join()

[print(result) for result in results]  # custom scraped content
```

## Auto-Retry Failed Requests

The final step in creating a robust scraper is to retry on failed requests. We will be using `Retry` from urllib3 and `HTTPAdapter` from requests.

The basic idea is as follows:

1. Using the return status code, identify the failed requests.
2. Wait an arbitrary amount of time. In our example, it will grow exponentially between tries.
3. Retry the request until it succeeds or reaches a maximum number of retries.

Fortunately, we can use these two libraries to implement that behavior. We must first configure `Retry` and then mount the `HTTPAdapter` for a requests session. Unlike the previous ones, we won't be calling `requests.get` directly but `requests_session.get`. Once created the session, it will use the same adapter for all subsequent calls.

For more information, visit the article on [Retry Failed Requests](/zenrows-academy/retry-failed-requests).

```python theme={null}
import requests
from bs4 import BeautifulSoup
from multiprocessing.pool import ThreadPool
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

apikey = "YOUR_ZENROWS_API_KEY" # paste your API Key here
zenrows_api_base = "https://api.zenrows.com/v1/"
urls = [
	# ... your URLs here
]
concurrency = 10  # maximum concurrent requests, depends on the plan

requests_session = requests.Session()
retries = Retry(
	total=3,  # number of retries
	backoff_factor=1,  # exponential time factor between attempts
	status_forcelist=[429, 500, 502, 503, 504]  # status codes that will retry
)

requests_session.mount("http://", HTTPAdapter(max_retries=retries))
requests_session.mount("https://", HTTPAdapter(max_retries=retries))

def extract_content(url, soup):
	# extracting logic goes here
	return {
		"url": url,
		"title": soup.title.string,
		"h1": soup.find("h1").text,
	}

def scrape_with_zenrows(url):
	try:
		response = requests_session.get(zenrows_api_base, params={
			"apikey": apikey,
			"url": url,
		})

		soup = BeautifulSoup(response.text, "html.parser")
		return extract_content(url, soup)
	except Exception as e:
		print(e)  # will print "Max retries exceeded"

pool = ThreadPool(concurrency)
results = pool.map(scrape_with_zenrows, urls)
pool.close()
pool.join()

[print(result) for result in results if result]  # custom scraped content
```

If you have any problem with the implementation or it does not work for your use case, <a href="mailto:success@zenrows.com">contact us</a> and we'll help you.
