Can I Get Cookies from the Responses?
Can I Logging In/Register and Access Content Behind Login?
Can I Maintain Sessions/IPs Between Requests
Can I Run the API/Proxy in Multiple Threads to Improve Speed?
Can I Send/Submit Forms?
CSS Selectors Do Not Work or 'Parser is Not Valid'
.my-class
CSS selector and store it in a property named test
. You would encode the selector and include it in your request like this:document.querySelectorAll(".my-class")
).
Does session_id Remember Session Data?
session_id
won’t store any request data, such as session cookies. You will get those back as usual and decide which ones to send on the next request.How do I Export Data to CSV using the Universal Scraper API?
autoparse
feature enabled, you can use Python to convert this data into a CSV file.json_normalize
function to control how many nested levels to flatten and rename fields. For instance, to flatten only one inner level and remove latLong
from latitude and longitude fields:autoparse
feature, you can use BeautifulSoup to parse the HTML and extract data. We’ll use the example of an eCommerce site from Scraping Course. Create a dictionary for each product with essential details, then use Pandas to convert this list of dictionaries into a DataFrame and save it as a CSV file.Here’s how to do it:json2csv
library to handle the JSON to CSV conversion.After getting the data, we will parse it with a flatten
transformer. As the name implies, it will flatten the nested structures inside the JSON. Then, save the file using writeFileSync
.Here’s an example using the ZenRows Universal Scraper API with Node.js:autoparse
you can use the cheerio library to parse the HTML and extract relevant information. We’ll use the Scraping Course eCommerce example for this task:As with the Python example, we will use AutoScout24 to extract data from HTML without the autoparse feature. For that, we will get the plain result and load it into cheerio. It will allow us to query elements as we would in the browser or with jQuery. We will return an object with essential data for each car entry in the list. Parse that list into CSV using json2csv
, and no flatten is needed this time. And lastly, store the result. These last two steps are similar to the autoparse case.Extract Data from Lists, Tables, and Grids
{"items": ".div-col > ul li"}
.That will get the text, but what of the links? To access attributes, we need a non-standard syntax for the selector: @href
. It won’t work with the previous selector since the last item is the li
element, which does not have an href
attribute. So we must change it for the link element: {"links": ".div-col > ul a @href"}
.CSS selectors, in some languages, must be encoded to avoid problems with URLs.wikitable
.To extract the rank, which is the first column, we can use "table.wikitable tr > :first-child"
. It will return an array with 243 items, 2 header lines, and 241 ranks. For the country name, second column, something similar but adding an a
to avoid capturing the flags: "table.wikitable tr > :nth-child(2) a"
. In this case, the array will have one less item since the second heading has no link. That might be a problem if we want to match items by array index..product
. Those contain all the data we want.It is essential to avoid duplicates, so we have to use some precise selectors. For example, ".product-item .product-link @href"
for the links. We added the .product-link
class because it is unique to the product cards. The same goes for name and price, which also have unique classes.
All in all, the final selector would be:requests.get
does to parameters. Remember to encode the URL and CSS extractor for different scenarios when that is not available.How Can I Set Specific Headers?
How Do I Send POST Requests with JSON Data?
application/x-www-form-urlencoded
. To send JSON data, you need to add the Content-Type: application/json
header manually, though some software/tools may do this automatically.Before trying on your target site, we recommend using a testing site like httpbin.io to verify that the parameters are sent correctly.Ensure that the parameters are sent and the format is correct. If in doubt, switch between both modes to confirm that the changes are applied correctly.For more info on POST requests, see How do I send POST requests?.How do I Send POST Requests?
application/x-www-form-urlencoded
, but many sites expect JSON content, requiring the Content-Type: application/json header
.How to encode URLs?
https://www.scrapingcourse.com/ecommerce/?course=web-scraping§ion=advanced
If you were to send this URL directly as part of your API request without encoding, and you also include the premium_proxy
parameter, the request might look something like this:course
and section
parameters as part of the API’s query string rather than the target URL. This could lead to errors or unintended behavior.To avoid such issues, you should encode your target URL before including it in the API request. URL encoding replaces special characters (like &
, ?
, and =
) with a format that can be safely transmitted over the internet.Here’s how you can encode the URL in Python:axios
(JavaScript) and requests
(Python), automatically encode URLs for you. However, if you are manually constructing requests or using a client that doesn’t handle encoding, you can use programming language functions or online tools to encode your URLs.For quick manual encoding, you can use an online tool, but remember that this method is not scalable for automated processes.Using Premium Proxies + JS Render and still blocked
What are Residential IPs?
premium_proxy
parameter to true
. This will route your request through a residential IP, significantly increasing your chances of success.It’s important to note that using residential IPs comes with an additional cost due to the higher value and lower detection rate of these proxies.What is Autoparse?
autoparse
parameter, the API will automatically parse the content of supported websites and return the data as a JSON object. This makes it much easier to work with the data, especially when dealing with complex websites that require extensive parsing logic.autoparse
parameter and try the request again. This will return the plain HTML response, allowing you to manually parse the data as needed.
What Are the Benefits of JavaScript Rendering?
wait_for
parameter to delay scraping until a specific element is present on the page, ensuring you capture the content you need.Why Some Headers are Managed by ZenRows?
cURL
to send custom headers that are permitted along with your ZenRows request: