Skip to content

Instantly share code, notes, and snippets.

View scrapehero-code's full-sized avatar

ScrapeHero Code scrapehero-code

View GitHub Profile
from lxml import html
import requests
import json
import argparse
from collections import OrderedDict
def get_headers():
return {"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"accept-encoding": "gzip, deflate, br",
@scrapehero-code
scrapehero-code / tripadvisor-restaurant.py
Last active August 6, 2019 12:57
Scraper to extract restaurant data from tripadvisor.com using Python and SelectorLib
import argparse
import csv
import requests
from selectorlib import Extractor
from formatter_classes import formatters
def write_to_file(response):
# writes HTML response to a file for debugging purpose
@scrapehero-code
scrapehero-code / h_and_m.json
Last active April 21, 2020 14:16
Sitemap to extract product details from H&M such as product name, price, reviews, description and details using Web Scraper Chrome Extension and Google Chrome
{
"_id": "h_and_m",
"startUrl": [
"https://www2.hm.com/en_us/women/products/shoes.html?product-type=ladies_shoes&sort=stock&productTypes=shoes&sizes=15_6_6_footwear&colorWithNames=black_000000&image-size=small&image=model&offset=0&page-size=36"
],
"selectors": [
{
"id": "listing",
"type": "SelectorElementClick",
"parentSelectors": [
@scrapehero-code
scrapehero-code / gist:e666dcda7594e0a88abeb873cda2fc75
Created July 1, 2019 07:06
Scraper to extract details from Wayfair.com such as product name, seller, rating, reviews, price and more using Web Scraper Chrome Extension and Google Chrome
{
"_id": "wayfair",
"startUrl": [
"https://www.wayfair.com/outdoor/sb0/hammocks-with-stands-c1864031.html"
],
"selectors": [
{
"id": "links",
"type": "SelectorLink",
"parentSelectors": [
@scrapehero-code
scrapehero-code / tripadvisor.py
Created June 24, 2019 11:48
Python code to extract restaurant details from Tripadvisor.com using Scrapy
# -*- coding: utf-8 -*-
import scrapy
from csv import DictReader
from os import path
from tripadvisor_restaurants.items import TripadvisorRestaurantsItem
from urllib.parse import urljoin
class TripadvisorRestaurantsSpiderSpider(scrapy.Spider):
name = 'tripadvisor_restaurants_spider'
allowed_domains = ['tripadvisor.com']
@scrapehero-code
scrapehero-code / overstock.json
Last active June 18, 2019 12:47
Extract product details such as product name, pricing, rating, reviews and more from Overstock.com using Web Scraper Chrome Extension and google chrome
{
"_id":"overstock_new",
"startUrl":[
"https://www.overstock.com/Home-Garden/Casual-Dinnerware/Gibson,/brand,/6451/subcat.html"
],
"selectors":[
{
"id":"product",
"type":"SelectorLink",
"parentSelectors":[