Can I Get Cookies from the Responses?
Can I Get Cookies from the Responses?
Can I Log In/Register and Access Content Behind a Login?
Can I Log In/Register and Access Content Behind a Login?
- Send POST requests.
- Fill in and submit a form using JavaScript Instructions.
Can I Maintain Sessions/IPs Between Requests
Can I Maintain Sessions/IPs Between Requests
Can I Run the API/Proxy in Multiple Threads to Improve Speed?
Can I Run the API/Proxy in Multiple Threads to Improve Speed?
Can I Send/Submit Forms?
Can I Send/Submit Forms?
CSS Selectors Do Not Work or 'Parser is Not Valid'
CSS Selectors Do Not Work or 'Parser is Not Valid'
Common Issues with CSS Selectors
One of the most common issues users encounter when working with CSS Selectors in web scraping is improper encoding. CSS Selectors need to be correctly encoded to be recognized and processed by the API.You can use ZenRows’ Playground or an online tool to properly encode your CSS Selectors before sending them in a request.Example of Using a CSS Selector
Let’s say you want to extract content from the.my-class CSS selector and store it in a property named test. You would encode the selector and include it in your request like this:Troubleshooting CSS Selector Issues
If you’re still getting empty responses or the parser reports an error:- Check the Raw HTML: Request the plain HTML to see if the content served by the website differs from what you see in your browser. Some websites serve different content based on the user’s location, device, or other factors.
-
Verify the Selector: Ensure the selector you’re using is correct by testing it in your browser’s Developer Tools (e.g., using Chrome’s Console with
document.querySelectorAll(".my-class")). - Review the Documentation: Refer to the ZenRows documentation for detailed information on using CSS Selectors with the API.
See Also
For comprehensive examples of working with complex layouts and advanced selector techniques, check out our Advanced CSS Selector Examples guide.Does session_id Remember Session Data?
Does session_id Remember Session Data?
session_id won’t store any request data, such as session cookies. You will get those back as usual and decide which ones to send on the next request.How do I Export Data to CSV using the Universal Scraper API?
How do I Export Data to CSV using the Universal Scraper API?
From JSON using Python
If you’ve obtained JSON output from ZenRows with theautoparse feature enabled, you can use Python to convert this data into a CSV file.The Pandas library will help us flatten nested JSON attributes and save the data as a CSV file.Here’s a sample Python script:json_normalize function to control how many nested levels to flatten and rename fields. For instance, to flatten only one inner level and remove latLong from latitude and longitude fields:From HTML using Python
When dealing with HTML output without theautoparse feature, you can use BeautifulSoup to parse the HTML and extract data. We’ll use the example of an eCommerce site from Scraping Course. Create a dictionary for each product with essential details, then use Pandas to convert this list of dictionaries into a DataFrame and save it as a CSV file.Here’s how to do it:From JSON using JavaScript
For JavaScript and Node.js, you can use thejson2csv library to handle the JSON to CSV conversion.After getting the data, we will parse it with a flatten transformer. As the name implies, it will flatten the nested structures inside the JSON. Then, save the file using writeFileSync.Here’s an example using the ZenRows Universal Scraper API with Node.js:From HTML using JavaScript
For extracting data from HTML withoutautoparse you can use the cheerio library to parse the HTML and extract relevant information. We’ll use the Scraping Course eCommerce example for this task:As with the Python example, we will use AutoScout24 to extract data from HTML without the autoparse feature. For that, we will get the plain result and load it into cheerio. It will allow us to query elements as we would in the browser or with jQuery. We will return an object with essential data for each car entry in the list. Parse that list into CSV using json2csv, and no flatten is needed this time. And lastly, store the result. These last two steps are similar to the autoparse case.Extract Data from Lists, Tables, and Grids
Extract Data from Lists, Tables, and Grids
Scraping from Lists
We will use the Wikipedia page on Web scraping for testing. A section at the bottom, “See also”, contains links in a list. We can get the content by using the CSS selector for the list items:{"items": ".div-col > ul li"}.That will get the text, but what of the links? To access attributes, we need a non-standard syntax for the selector: @href. It won’t work with the previous selector since the last item is the li element, which does not have an href attribute. So we must change it for the link element: {"links": ".div-col > ul a @href"}.CSS selectors, in some languages, must be encoded to avoid problems with URLs.
Scraping from Tables
Assuming regular tables (no empty cells, rows with fewer items, and others), we can extract table data with CSS selectors. We’ll use a list of countries, the first table on the page, the one with the classwikitable.To extract the rank, which is the first column, we can use "table.wikitable tr > :first-child". It will return an array with 243 items, 2 header lines, and 241 ranks. For the country name, second column, something similar but adding an a to avoid capturing the flags: "table.wikitable tr > :nth-child(2) a". In this case, the array will have one less item since the second heading has no link. That might be a problem if we want to match items by array index.Scraping from Product Grids
As with the tables, non-regular grids might cause problems. We’ll scrape the price, product name, and link from an online store. By manually searching the page’s content, we arrive at cards with the class.product. Those contain all the data we want.It is essential to avoid duplicates, so we have to use some precise selectors. For example, ".product-item .product-link @href" for the links. We added the .product-link class because it is unique to the product cards. The same goes for name and price, which also have unique classes.
All in all, the final selector would be:requests.get does to parameters. Remember to encode the URL and CSS extractor for different scenarios when that is not available.How Can I Set Specific Headers?
How Can I Set Specific Headers?
How Do I Send POST Requests with JSON Data?
How Do I Send POST Requests with JSON Data?
application/x-www-form-urlencoded. To send JSON data, you need to add the Content-Type: application/json header manually, though some software/tools may do this automatically.Before trying on your target site, we recommend using a testing site like httpbin.io to verify that the parameters are sent correctly.Ensure that the parameters are sent and the format is correct. If in doubt, switch between both modes to confirm that the changes are applied correctly.For more info on POST requests, see How do I send POST requests?.How do I Send POST Requests?
How do I Send POST Requests?
application/x-www-form-urlencoded, but many sites expect JSON content, requiring the Content-Type: application/json header.How to encode URLs?
How to encode URLs?
https://www.scrapingcourse.com/ecommerce/?course=web-scraping§ion=advancedIf you were to send this URL directly as part of your API request without encoding, and you also include the premium_proxy parameter, the request might look something like this:course and section parameters as part of the API’s query string rather than the target URL. This could lead to errors or unintended behavior.To avoid such issues, you should encode your target URL before including it in the API request. URL encoding replaces special characters (like &, ?, and =) with a format that can be safely transmitted over the internet.Here’s how you can encode the URL in Python:axios (JavaScript) and requests (Python), automatically encode URLs for you. However, if you are manually constructing requests or using a client that doesn’t handle encoding, you can use programming language functions or online tools to encode your URLs.For quick manual encoding, you can use an online tool, but remember that this method is not scalable for automated processes.Using Premium Proxies + JS Render and still blocked
Using Premium Proxies + JS Render and still blocked
What are Residential IPs?
What are Residential IPs?
Understanding Proxy Types: Data Center vs. Residential IPs
When it comes to web scraping proxies, there are two main types of IPs you can use: data center and residential.- Data Center IPs: These are IP addresses provided by cloud service providers or hosting companies. They are typically fast and reliable, but because they are easily recognizable as belonging to data centers, they are more likely to be blocked by websites that have anti-scraping measures in place.
- Residential IPs: These IP addresses are assigned by Internet Service Providers (ISPs) to real residential users. Since they appear as regular users browsing the web, they are much harder to detect and block. This makes residential IPs particularly valuable when scraping sites with strong anti-bot protections, like Google or other heavily guarded domains.
How ZenRows Uses Residential IPs
By default, ZenRows uses data center connections for your requests. However, if you’re facing blocks or need to scrape highly protected websites, you can opt for residential IPs by setting thepremium_proxy parameter to true. This will route your request through a residential IP, significantly increasing your chances of success.It’s important to note that using residential IPs comes with an additional cost due to the higher value and lower detection rate of these proxies.Example of a Request with Residential IPs
Here’s how you can make a request using a residential IP:Troubleshooting Blocks
If you continue to experience blocks even with residential IPs, feel free to contact us, and we’ll work with you to find a solution.What is Autoparse?
What is Autoparse?
Simplifying Data Extraction with Autoparse
ZenRows offers a powerful feature called Autoparse, designed to simplify the process of extracting structured data from websites. This feature leverages custom parsers allowing you to easily retrieve data in a structured JSON format rather than raw HTML.How It Works
By default, when you call the ZenRows API, the response will be in Plain HTML. However, when you activate theautoparse parameter, the API will automatically parse the content of supported websites and return the data as a JSON object. This makes it much easier to work with the data, especially when dealing with complex websites that require extensive parsing logic.Example of a Request with Autoparse
Here’s how you can make an API call with the Autoparse feature enabled:Limitations and Troubleshooting
- Supported Domains: The Autoparse feature is in experimental phase and doesn’t work in all domains. You can view some of the supported domains on the ZenRows Scraper page. If the website you’re trying to scrape isn’t supported, the response will either be empty, incomplete, or an error.
-
Fallback to HTML: If you find that Autoparse doesn’t return the desired results, you can simply remove the
autoparseparameter and try the request again. This will return the plain HTML response, allowing you to manually parse the data as needed.
What Are the Benefits of JavaScript Rendering?
What Are the Benefits of JavaScript Rendering?
wait_for parameter to delay scraping until a specific element is present on the page, ensuring you capture the content you need.Why Some Headers are Managed by ZenRows?
Why Some Headers are Managed by ZenRows?
Example of Sending Custom Headers
Here’s an example usingcURL to send custom headers that are permitted along with your ZenRows request: