> ## Documentation Index
> Fetch the complete documentation index at: https://docs.zenrows.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Scrape from a List of URLs

> Scrape data from multiple URLs in parallel or sequentially with ZenRows using Python and JavaScript for efficient bulk web scraping.

When you need to extract data from multiple web pages, you can process a list of URLs using either sequential or parallel approaches. Sequential processing handles URLs one at a time, while parallel processing (handling multiple requests simultaneously) significantly reduces scraping time for large URL lists.

<Warning>This guide assumes you already have a list of URLs ready for scraping</Warning>

If you need to discover URLs from a starting page, check our [Scrape and Crawl from a Seed URL](/zenrows-academy/scrape-and-crawl-from-a-seed-url).

<Info>The techniques shown in this guide work with any programming language that supports HTTPS requests (PHP, Java, Ruby, Go, C#, etc.). We're using Python and Node.js examples for simplicity, but you can adapt these patterns to your preferred language.</Info>

## Prerequisites

### Python Setup

You'll need <a href="https://www.python.org/downloads/" target="_blank" rel="noopener noreferrer nofollow">Python 3</a> installed. Install the required libraries:

```bash theme={null}
pip install requests beautifulsoup4
```

The `requests` library handles HTTP requests to ZenRows, while `beautifulsoup4` parses HTML content for data extraction.

### Node.js Setup

You'll need <a href="https://nodejs.org/" target="_blank" rel="noopener noreferrer nofollow">Node.js</a> installed. Install the required packages:

```bash theme={null}
npm install axios cheerio
```

The `axios` library handles HTTP requests, while `cheerio` provides server-side HTML parsing similar to jQuery.

## Sequential Processing

Sequential processing handles URLs one after another in a simple loop. This approach works well for small URL lists or when you want to avoid overwhelming target servers with concurrent requests.

### Basic sequential scraping

<CodeGroup>
  ```python Python theme={null}
  import requests

  zenrows_api_base = "https://api.zenrows.com/v1/?apikey=YOUR_ZENROWS_API_KEY"
  urls = [
  	# ... your URLs here
  ]

  for url in urls:
  	response = requests.get(zenrows_api_base, params={"url": url})
  	print(response.text)
  ```

  ```javascript Node.js theme={null}
  const axios = require('axios');

  const zenrowsApiBase = "https://api.zenrows.com/v1/";
  const apiKey = "YOUR_ZENROWS_API_KEY";
  const urls = [
  	// ... your URLs here
  ];

  async function scrapeSequentially() {
  	for (const url of urls) {
  		try {
  			const response = await axios.get(zenrowsApiBase, {
  				params: {
  					apikey: apiKey,
  					url: url
  				}
  			});
  			console.log(response.data);
  		} catch (error) {
  			console.error(`Error scraping ${url}:`, error.message);
  		}
  	}
  }

  scrapeSequentially();
  ```
</CodeGroup>

This code sends each URL to ZenRows sequentially and prints the returned HTML. The loop processes URLs in order, waiting for each request to complete before moving to the next.

### Extracting specific data

To make the scraper more useful, extract specific data like page titles and headings:

<CodeGroup>
  ```python Python theme={null}
  import requests
  from bs4 import BeautifulSoup

  zenrows_api_base = "https://api.zenrows.com/v1/?apikey=YOUR_ZENROWS_API_KEY"
  urls = [
  	# ... your URLs here
  ]

  def extract_content(url, soup):
  	return {
  		"url": url,
  		"title": soup.title.string,
  		"h1": soup.find("h1").text,
  	}

  results = []
  for url in urls:
  	response = requests.get(zenrows_api_base, params={"url": url})
  	soup = BeautifulSoup(response.text, "html.parser")
  	results.append(extract_content(url, soup))

  print(results)
  ```

  ```javascript Node.js theme={null}
  const axios = require('axios');
  const cheerio = require('cheerio');

  const zenrowsApiBase = "https://api.zenrows.com/v1/";
  const apiKey = "YOUR_ZENROWS_API_KEY";
  const urls = [
  	// ... your URLs here
  ];

  function extractContent(url, $) {
  	return {
  		url: url,
  		title: $('title').text() || "No title",
  		h1: $('h1').first().text() || "No H1"
  	};
  }

  async function scrapeWithExtraction() {
  	const results = [];
  	
  	for (const url of urls) {
  		try {
  			const response = await axios.get(zenrowsApiBase, {
  				params: {
  					apikey: apiKey,
  					url: url
  				}
  			});
  			
  			const $ = cheerio.load(response.data);
  			results.push(extractContent(url, $));
  		} catch (error) {
  			console.error(`Error scraping ${url}:`, error.message);
  			results.push({ url, error: error.message });
  		}
  	}
  	
  	console.log(results);
  }

  scrapeWithExtraction();
  ```
</CodeGroup>

The `extract_content` function parses each page and extracts the title and main heading. Results are stored in a list for further processing. The function includes safety checks to handle pages missing title or H1 elements.

### Adding error handling

For production use, add error handling to manage failed requests:

<CodeGroup>
  ```python Python theme={null}
  import requests
  from bs4 import BeautifulSoup
  import time

  zenrows_api_base = "https://api.zenrows.com/v1/?apikey=YOUR_ZENROWS_API_KEY"
  urls = [
  	# ... your URLs here
  ]

  def extract_content(url, soup):
  	return {
  		"url": url,
  		"title": soup.title.string if soup.title else "No title",
  		"h1": soup.find("h1").text if soup.find("h1") else "No H1"
  	}

  results = []
  for url in urls:
  	try:
  		response = requests.get(zenrows_api_base, params={"url": url})
  		response.raise_for_status()
  		
  		soup = BeautifulSoup(response.text, "html.parser")
  		results.append(extract_content(url, soup))
  		
  		# Add delay between requests
  		time.sleep(1)
  		
  	except requests.exceptions.RequestException as e:
  		print(f"Error scraping {url}: {e}")
  		results.append({"url": url, "error": str(e)})

  print(results)
  ```

  ```javascript Node.js theme={null}
  const axios = require('axios');
  const cheerio = require('cheerio');

  const zenrowsApiBase = "https://api.zenrows.com/v1/";
  const apiKey = "YOUR_ZENROWS_API_KEY";
  const urls = [
  	// ... your URLs here
  ];

  function extractContent(url, $) {
  	return {
  		url: url,
  		title: $('title').text() || "No title",
  		h1: $('h1').first().text() || "No H1"
  	};
  }

  function delay(ms) {
  	return new Promise(resolve => setTimeout(resolve, ms));
  }

  async function scrapeWithErrorHandling() {
  	const results = [];
  	
  	for (const url of urls) {
  		try {
  			const response = await axios.get(zenrowsApiBase, {
  				params: {
  					apikey: apiKey,
  					url: url
  				}
  			});
  			
  			const $ = cheerio.load(response.data);
  			results.push(extractContent(url, $));
  			
  			// Add delay between requests
  			await delay(1000);
  			
  		} catch (error) {
  			console.error(`Error scraping ${url}:`, error.message);
  			results.push({ url, error: error.message });
  		}
  	}
  	
  	console.log(results);
  }

  scrapeWithErrorHandling();
  ```
</CodeGroup>

This version includes try-catch blocks to handle network errors, validates HTTP responses, and adds a delay between requests to be respectful to servers.

## Parallel Processing

Parallel processing handles multiple URLs simultaneously, dramatically reducing total scraping time. However, you must manage concurrency carefully to avoid overwhelming servers or exceeding rate limits.

<Info>While these examples use Python and Node.js, the same parallel processing concepts apply to other languages. Most modern programming languages provide similar concurrency features (async/await, thread pools, promises, etc.).</Info>

### Using standard libraries

<CodeGroup>
  ```python Python theme={null}
  import requests
  from bs4 import BeautifulSoup
  from concurrent.futures import ThreadPoolExecutor
  import time

  zenrows_api_base = "https://api.zenrows.com/v1/"
  api_key = "YOUR_ZENROWS_API_KEY"

  urls = [
  	# ... your URLs here
  ]

  def scrape_single_url(url):
  	try:
  		response = requests.get(zenrows_api_base, params={"apikey": api_key, "url": url})
  		response.raise_for_status()
  		soup = BeautifulSoup(response.text, "html.parser")
  		return {
  			"url": url,
  			"title": soup.title.string if soup.title else "No title",
  			"h1": soup.find("h1").text if soup.find("h1") else "No H1"
  		}
  	except Exception as e:
  		return {"url": url, "error": str(e)}

  def scrape_parallel_threads():
  	start_time = time.time()
  	with ThreadPoolExecutor(max_workers=5) as executor:
  		results = list(executor.map(scrape_single_url, urls))
  	end_time = time.time()
  	print(f"Completed in {end_time - start_time:.2f} seconds")
  	print(results)

  if __name__ == "__main__":
  	scrape_parallel_threads()
  ```

  ```javascript Node.js theme={null}
  const axios = require('axios');
  const cheerio = require('cheerio');

  const zenrowsApiBase = "https://api.zenrows.com/v1/";
  const apiKey = "YOUR_ZENROWS_API_KEY";
  const urls = [
  	// ... your URLs here
  ];

  async function scrapeSingleUrl(url) {
  	try {
  		const response = await axios.get(zenrowsApiBase, {
  			params: {
  				apikey: apiKey,
  				url: url
  			}
  		});
  		
  		const $ = cheerio.load(response.data);
  		return {
  			url: url,
  			title: $('title').text() || "No title",
  			h1: $('h1').first().text() || "No H1"
  		};
  	} catch (error) {
  		return { url, error: error.message };
  	}
  }

  async function scrapeParallel() {
  	const startTime = Date.now();
  	
  	// Process all URLs concurrently
  	const results = await Promise.all(urls.map(scrapeSingleUrl));
  	
  	const endTime = Date.now();
  	console.log(`Completed in ${(endTime - startTime) / 1000} seconds`);
  	console.log(results);
  }

  scrapeParallel();
  ```
</CodeGroup>

This approach uses `ThreadPoolExecutor` in Python and `Promise.all` in Node.js to handle multiple requests simultaneously. The concurrency is limited by the `max_workers` parameter in Python and naturally managed by Node.js's event loop.

### Controlling concurrency with batching

For better control over concurrency, process URLs in batches:

<CodeGroup>
  ```python Python theme={null}
  import requests
  from bs4 import BeautifulSoup
  from concurrent.futures import ThreadPoolExecutor
  import time

  zenrows_api_base = "https://api.zenrows.com/v1/"
  api_key = "YOUR_ZENROWS_API_KEY"

  def scrape_single_url(url):
  	try:
  		response = requests.get(zenrows_api_base, params={"apikey": api_key, "url": url})
  		response.raise_for_status()
  		soup = BeautifulSoup(response.text, "html.parser")
  		return {
  			"url": url,
  			"title": soup.title.string if soup.title else "No title",
  			"h1": soup.find("h1").text if soup.find("h1") else "No H1"
  		}
  	except Exception as e:
  		return {"url": url, "error": str(e)}

  def process_batch(urls_batch, max_workers=5):
  	with ThreadPoolExecutor(max_workers=max_workers) as executor:
  		return list(executor.map(scrape_single_url, urls_batch))

  def scrape_with_batching(all_urls, batch_size=10, max_workers=5):
  	all_results = []
  	for i in range(0, len(all_urls), batch_size):
  		batch = all_urls[i:i + batch_size]
  		print(f"Processing batch {i//batch_size + 1} ({len(batch)} URLs)")
  		batch_results = process_batch(batch, max_workers)
  		all_results.extend(batch_results)
  		# Optional delay between batches
  		time.sleep(1)
  	return all_results

  if __name__ == "__main__":
  	urls = [
  		# ... your URLs here
  	]
  	results = scrape_with_batching(urls, batch_size=5, max_workers=3)
  	print(f"Total results: {len(results)}")
  ```

  ```javascript Node.js theme={null}
  const axios = require('axios');
  const cheerio = require('cheerio');

  const zenrowsApiBase = "https://api.zenrows.com/v1/";
  const apiKey = "YOUR_ZENROWS_API_KEY";

  async function scrapeSingleUrl(url) {
  	try {
  		const response = await axios.get(zenrowsApiBase, {
  			params: {
  				apikey: apiKey,
  				url: url
  			}
  		});
  		
  		const $ = cheerio.load(response.data);
  		return {
  			url: url,
  			title: $('title').text() || "No title",
  			h1: $('h1').first().text() || "No H1"
  		};
  	} catch (error) {
  		return { url, error: error.message };
  	}
  }

  async function processBatch(urlsBatch, concurrency = 5) {
  	const semaphore = new Array(concurrency).fill(null);
  	let index = 0;
  	
  	async function processNext() {
  		if (index >= urlsBatch.length) return null;
  		const currentIndex = index++;
  		return await scrapeSingleUrl(urlsBatch[currentIndex]);
  	}
  	
  	const results = await Promise.all(
  		semaphore.map(async () => {
  			const batchResults = [];
  			let result;
  			while ((result = await processNext()) !== null) {
  				batchResults.push(result);
  			}
  			return batchResults;
  		})
  	);
  	
  	return results.flat();
  }

  async function scrapeWithBatching(allUrls, batchSize = 10, concurrency = 5) {
  	const allResults = [];
  	
  	for (let i = 0; i < allUrls.length; i += batchSize) {
  		const batch = allUrls.slice(i, i + batchSize);
  		console.log(`Processing batch ${Math.floor(i/batchSize) + 1} (${batch.length} URLs)`);
  		
  		const batchResults = await processBatch(batch, concurrency);
  		allResults.push(...batchResults);
  		
  		// Optional delay between batches
  		await new Promise(resolve => setTimeout(resolve, 1000));
  	}
  	
  	return allResults;
  }

  // Example usage
  const urls = [
  	// ... your URLs here
  ];

  scrapeWithBatching(urls, 5, 3).then(results => {
  	console.log(`Total results: ${results.length}`);
  	console.log(results);
  });
  ```
</CodeGroup>

This batching approach processes URLs in smaller groups, giving you better control over server load and memory usage. You can adjust batch size and concurrency based on your specific needs.

### Using ZenRows SDKs

ZenRows SDKs simplify parallel web scraping by providing built-in concurrency management, automatic error handling, and connection pooling. When you're scraping multiple URLs, the SDKs handle the complex orchestration automatically, so you can focus on extracting the data you need rather than managing technical details.

<Tabs>
  <Tab title="Python SDK">
    First, install the ZenRows Python SDK:

    ```bash theme={null}
    pip install zenrows
    ```

    ```python Python SDK theme={null}
    from zenrows import ZenRowsClient
    import asyncio
    from bs4 import BeautifulSoup

    client = ZenRowsClient("YOUR_ZENROWS_API_KEY", concurrency=5, retries=1)

    urls = [
    	# ... your URLs here
    ]

    async def scrape_url(url):
    	try:
    		response = await client.get_async(url)
    		if response.ok:
    			soup = BeautifulSoup(response.text, "html.parser")
    			return {
    				"url": url,
    				"title": soup.title.string if soup.title else "No title",
    				"h1": soup.find("h1").text if soup.find("h1") else "No H1"
    			}
    		else:
    			return {"url": url, "error": f"HTTP {response.status_code}"}
    	except Exception as e:
    		return {"url": url, "error": str(e)}

    async def main():
    	results = await asyncio.gather(*[scrape_url(url) for url in urls])
    	valid_results = [r for r in results if r is not None]
    	print(valid_results)

    if __name__ == "__main__":
    	asyncio.run(main())
    ```
  </Tab>

  <Tab title="Node.js SDK">
    First, install the ZenRows Node.js SDK:

    ```bash theme={null}
    npm install zenrows
    ```

    ```javascript Node.js SDK theme={null}
    const { ZenRowsClient } = require('zenrows');
    const cheerio = require('cheerio');

    const client = new ZenRowsClient('YOUR_ZENROWS_API_KEY', { concurrency: 5, retries: 1 });

    const urls = [
    	// ... your URLs here
    ];

    async function scrapeUrl(url) {
    	try {
    		const response = await client.get(url);
    		if (response.ok) {
    			const $ = cheerio.load(response.data);
    			return {
    				url: url,
    				title: $('title').text() || "No title",
    				h1: $('h1').first().text() || "No H1"
    			};
    		} else {
    			return { url, error: `HTTP ${response.status}` };
    		}
    	} catch (error) {
    		return { url, error: error.message };
    	}
    }

    async function main() {
    	const results = await Promise.all(urls.map(scrapeUrl));
    	const validResults = results.filter(r => r !== null);
    	console.log(validResults);
    }

    main();
    ```
  </Tab>
</Tabs>

The SDKs handle connection pooling, automatic retries, and concurrency limits automatically. The `concurrency=5` parameter ensures no more than 5 requests run simultaneously.

### Processing large URL lists with SDKs

For very large URL lists, process them in batches to manage memory usage:

<CodeGroup>
  ```python Python SDK - Batching theme={null}
  from zenrows import ZenRowsClient
  import asyncio
  from bs4 import BeautifulSoup

  client = ZenRowsClient("YOUR_ZENROWS_API_KEY", concurrency=5, retries=1)

  async def scrape_url(url):
  	try:
  		response = await client.get_async(url)
  		if response.ok:
  			soup = BeautifulSoup(response.text, "html.parser")
  			return {
  				"url": url,
  				"title": soup.title.string if soup.title else "No title",
  				"h1": soup.find("h1").text if soup.find("h1") else "No H1"
  			}
  	except Exception as e:
  		return {"url": url, "error": str(e)}

  async def process_batch(urls_batch):
  	results = await asyncio.gather(*[scrape_url(url) for url in urls_batch])
  	return [r for r in results if r is not None]

  async def main():
  	# Your large list of URLs
  	all_urls = [
  		# ... hundreds or thousands of URLs
  	]
  	
  	batch_size = 50
  	all_results = []
  	
  	for i in range(0, len(all_urls), batch_size):
  		batch = all_urls[i:i + batch_size]
  		batch_results = await process_batch(batch)
  		all_results.extend(batch_results)
  		print(f"Processed batch {i//batch_size + 1}, got {len(batch_results)} results")
  	
  	print(f"Total results: {len(all_results)}")

  asyncio.run(main())
  ```

  ```javascript Node.js theme={null}
  const { ZenRowsClient } = require('zenrows');
  const cheerio = require('cheerio');

  const client = new ZenRowsClient('YOUR_ZENROWS_API_KEY', { concurrency: 5, retries: 1 });

  async function scrapeUrl(url) {
  	try {
  		const response = await client.get(url);
  		if (response.ok) {
  			const $ = cheerio.load(response.data);
  			return {
  				url: url,
  				title: $('title').text() || "No title",
  				h1: $('h1').first().text() || "No H1"
  			};
  		}
  	} catch (error) {
  		return { url, error: error.message };
  	}
  }

  async function processBatch(urlsBatch) {
  	const results = await Promise.all(urlsBatch.map(scrapeUrl));
  	return results.filter(r => r !== null && !r.error);
  }

  async function main() {
  	// Your large list of URLs
  	const allUrls = [
  		// ... hundreds or thousands of URLs
  	];
  	
  	const batchSize = 50;
  	const allResults = [];
  	
  	for (let i = 0; i < allUrls.length; i += batchSize) {
  		const batch = allUrls.slice(i, i + batchSize);
  		const batchResults = await processBatch(batch);
  		allResults.push(...batchResults);
  		console.log(`Processed batch ${Math.floor(i/batchSize) + 1}, got ${batchResults.length} results`);
  	}
  	
  	console.log(`Total results: ${allResults.length}`);
  }

  main();
  ```
</CodeGroup>

This approach processes URLs in batches of 50, reducing memory usage and providing progress updates for long-running scraping jobs.

## Best Practices

Follow these guidelines for efficient URL list scraping:

1. **Start small** - Test with a few URLs before scaling up
2. **Monitor performance** - Track success rates and response times
3. **Handle errors gracefully** - Always include error handling for production code
4. **Respect rate limits** - Don't exceed your plan's concurrency limits
5. **Save progress** - For large jobs, save results incrementally to avoid data loss
6. **Use appropriate concurrency** - Balance speed with server respect and resource usage

## Understanding Concurrency Limits

The concurrency level you choose affects both performance and resource usage:

* **Target server capacity** - Some servers handle more concurrent requests than others
* **Your ZenRows plan limits** - Higher plans support more concurrent requests
* **Data processing speed** - Don't set concurrency higher than your ability to process results
* **Memory usage** - Higher concurrency uses more memory for storing responses

For detailed concurrency management and rate limiting information, check our [Concurrency Guide](/universal-scraper-api/features/concurrency).

## Further Reading

* [Scrape and Crawl from a Seed URL](/zenrows-academy/scrape-and-crawl-from-a-seed-url)
* [Concurrency Management](/universal-scraper-api/features/concurrency)
* [Error Handling Best Practices](/universal-scraper-api/troubleshooting/troubleshooting-guide)
* <a href="https://github.com/ZenRows/zenrows-python-sdk" target="_blank" rel="noopener noreferrer nofollow">ZenRows Python SDK Repository</a>
* <a href="https://github.com/ZenRows/zenrows-node-sdk" target="_blank" rel="noopener noreferrer nofollow">ZenRows Node.js SDK Repository</a>
