Skip to content

Instantly share code, notes, and snippets.

@scrapehero-code
Last active April 21, 2020 14:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save scrapehero-code/63bd64202090ecfa226e7881a1e748d6 to your computer and use it in GitHub Desktop.
Save scrapehero-code/63bd64202090ecfa226e7881a1e748d6 to your computer and use it in GitHub Desktop.
Sitemap to extract product details from H&M such as product name, price, reviews, description and details using Web Scraper Chrome Extension and Google Chrome
{
"_id": "h_and_m",
"startUrl": [
"https://www2.hm.com/en_us/women/products/shoes.html?product-type=ladies_shoes&sort=stock&productTypes=shoes&sizes=15_6_6_footwear&colorWithNames=black_000000&image-size=small&image=model&offset=0&page-size=36"
],
"selectors": [
{
"id": "listing",
"type": "SelectorElementClick",
"parentSelectors": [
"_root"
],
"selector": "div.item-details",
"multiple": true,
"delay": "2000",
"clickElementSelector": "button.js-load-more",
"clickType": "clickMore",
"discardInitialElements": "do-not-discard",
"clickElementUniquenessType": "uniqueHTMLText"
},
{
"id": "link",
"type": "SelectorLink",
"parentSelectors": [
"listing"
],
"selector": "a.link",
"multiple": false,
"delay": 0
},
{
"id": "name",
"type": "SelectorText",
"parentSelectors": [
"link"
],
"selector": "h1.primary",
"multiple": false,
"regex": "",
"delay": 0
},
{
"id": "price",
"type": "SelectorText",
"parentSelectors": [
"link"
],
"selector": "span.price-value",
"multiple": false,
"regex": "",
"delay": 0
},
{
"id": "description",
"type": "SelectorText",
"parentSelectors": [
"link"
],
"selector": "div.pdp-text",
"multiple": false,
"regex": "",
"delay": 0
},
{
"id": "review_count",
"type": "SelectorText",
"parentSelectors": [
"link"
],
"selector": "span.reviews-number",
"multiple": false,
"regex": "\\(.*?\\)",
"delay": 0
},
{
"id": "details",
"type": "SelectorText",
"parentSelectors": [
"link"
],
"selector": "div.product-details-details",
"multiple": false,
"regex": "",
"delay": 0
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment