Skip to content

Instantly share code, notes, and snippets.

@LeeMeng2020
LeeMeng2020 / Amazon UK bestsellers WS.json
Created January 4, 2021 11:41
Amazon UK bestsellers with pagination
{
"_id": "amazon-uk-bestsellers-paginate",
"startUrl": ["https://www.amazon.co.uk/gp/bestsellers/books/"],
"selectors": [{
"id": "Item wrappers",
"type": "SelectorElement",
"parentSelectors": ["_root", "Next page"],
"selector": "ol li div.aok-relative",
"multiple": true
}, {
@LeeMeng2020
LeeMeng2020 / CBC-Canada-limited-load-more.json
Created October 4, 2020 13:35
This one was interesting; I wanted to figure out a way to limit the Load More. The sitemap below will stop at 200 results. More details in the attached text file.
{
"_id": "cbc-load-more",
"startUrl": ["https://www.cbc.ca/search?q=quebec%20tourism&section=all&sortOrder=relevance&media=all"],
"selectors": [{
"id": "Separate Load More",
"type": "SelectorElementClick",
"parentSelectors": ["_root"],
"selector": " div.contentListCards",
"multiple": false,
"delay": "3700",
@LeeMeng2020
LeeMeng2020 / adac-germany.json
Last active October 3, 2020 03:46
This site uses random attribute names like sc-hkbPbT so better selectors are needed. The sitemap below which will expand all the accordions and get the links. To make it click on all links you'll need to add data scrapers under "Get links" (currently it will just get the URLs and not click thru): Originally posted at: https://forum.webscraper.io…
{
"_id": "adac-test",
"startUrl": ["https://www.adac.de/rund-ums-fahrzeug/autokatalog/marken-modelle/citroen/berlingo/2generation-facelift-2/"],
"selectors": [{
"id": "Open accordians",
"type": "SelectorElementClick",
"parentSelectors": ["_root"],
"selector": "main div[class^='sc']:contains('Fahrzeuge') div[role='button'] ~ div[class^='sc']",
"multiple": true,
"delay": "2100",
@LeeMeng2020
LeeMeng2020 / yellowpages-south-africa.json
Created September 10, 2020 13:41
This'll click all the Show Email buttons and scrape the emails. There's a 750 ms (0.75 sec) delay between each click so it'll take about 15 sec to complete for the example search. I used Page load delay (ms): 6000.
{
"_id": "yellowpages-co-za",
"startUrl": ["https://www.yellowpages.co.za/search?what=accounting+services&where=pinetown"],
"selectors": [{
"id": "listing wrappers",
"type": "SelectorElement",
"parentSelectors": ["_root"],
"selector": "div.yp-object-result-item",
"multiple": true,
"delay": 0
@LeeMeng2020
LeeMeng2020 / square-enix-store
Last active May 28, 2020 02:10
Web Scraper sitemap for Square Enix Store. Here I have made the scroller separate from the data scraper (usually the data scraper is a child of the scroller).
{
"_id": "forum-square-enix-store",
"startUrl": ["https://store.na.square-enix-games.com/en_US/merchandise/all-merchandise"],
"selectors": [{
"id": "Separate scroller",
"type": "SelectorElementScroll",
"parentSelectors": ["_root"],
"selector": "a.product-link-box",
"multiple": true,
"delay": "2100"
@LeeMeng2020
LeeMeng2020 / amazon-reviews-scraper-2020.json
Last active October 11, 2020 20:05 — forked from scrapehero/amazon-reviews.json
Amazon reviews scraper updated for 2020. This is a sitemap to extract review listings for a single product on Amazon.com using Web Scraper Chrome Extension. Handles pagination and now includes ability to limit number of pages. Please read the instructions and update info in the comments section below.
{
"_id": "amazon-reviews-scraper-2020",
"startUrl": ["https://www.amazon.com/Ovente-Dual-Sided-Magnification-Electrical-MPWD3185BZ1X7X/product-reviews/B074GCRS9D",
"https://www.amazon.com/Columbia-Redmond-Waterproof-Cordovan-Regular/product-reviews/B07JH35P96",
"https://www.amazon.com/Merrell-Mens-Moab-Waterproof-Hiking/product-reviews/B01HF9ZN7I",
"https://www.amazon.com/Screen-Protector-SPARIN-Tempered-Glass/product-reviews/B013JZCAZK"
],
"selectors": [{
"id": "Product name",
"type": "SelectorText",