{"items": ".div-col > ul li"}
.
That will get the text, but what of the links? To access attributes, we need a non-standard syntax for the selector: @href
. It won’t work with the previous selector since the last item is the li
element, which does not have an href
attribute. So we must change it for the link element: {"links": ".div-col > ul a @href"}
.
CSS selectors, in some languages, must be encoded to avoid problems with URLs.
wikitable
.
To extract the rank, which is the first column, we can use "table.wikitable tr > :first-child"
. It will return an array with 243 items, 2 header lines, and 241 ranks. For the country name, second column, something similar but adding an a
to avoid capturing the flags: "table.wikitable tr > :nth-child(2) a"
. In this case, the array will have one less item since the second heading has no link. That might be a problem if we want to match items by array index.
.product
. Those contain all the data we want.
It is essential to avoid duplicates, so we have to use some precise selectors. For example, ".product-item .product-link @href"
for the links. We added the .product-link
class because it is unique to the product cards. The same goes for name and price, which also have unique classes.
All in all, the final selector would be:
requests.get
does to parameters. Remember to encode the URL and CSS extractor for different scenarios when that is not available.