This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "nodes": [ | |
| { | |
| "parameters": { | |
| "rule": { | |
| "interval": [ | |
| { | |
| "field": "weeks", | |
| "triggerAtDay": [1], | |
| "triggerAtHour": 9 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """Glassdoor company profile scraper using the internal BFF API with Decodo proxies. | |
| Extracts company overview data: size, industry, headquarters, ratings, | |
| CEO approval, and more. Combines two BFF endpoints: | |
| - /bff/employer-profile-mono/employer-data (company info) | |
| - /bff/employer-profile-mono/employer-ratings (full ratings breakdown) | |
| Data quality focus: raw JSON dumps, field-by-field validation, ghost field | |
| detection across multiple companies. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """Glassdoor interview experiences scraper using the internal BFF API with Decodo proxies. | |
| Handles Cloudflare challenge detection, session validation, retry logic, | |
| CSRF token extraction, and Decodo proxy management. | |
| Uses curl_cffi's advanced fingerprinting features: | |
| - Pinned browser version (TLS fingerprint matches User-Agent exactly) | |
| - TLS extension randomization (real Chrome permutes on every request) | |
| - TLS GREASE values (Chrome adds random values for robustness) | |
| - Brotli certificate compression (matches Chrome default) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """Glassdoor job listings scraper using the internal BFF API with Decodo proxies.""" | |
| import argparse | |
| import csv | |
| import json | |
| import logging | |
| import os | |
| import random | |
| import re | |
| import sys |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """Glassdoor reviews scraper using the internal BFF API with Decodo proxies. | |
| Handles Cloudflare challenge detection, session validation, retry logic, | |
| CSRF token extraction, and Decodo proxy management. | |
| Uses curl_cffi's advanced fingerprinting features: | |
| - Pinned browser version (TLS fingerprint matches User-Agent exactly) | |
| - TLS extension randomization (real Chrome permutes on every request) | |
| - TLS GREASE values (Chrome adds random values for robustness) | |
| - Brotli certificate compression (matches Chrome default) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| Yelp Business Details Scraper | |
| Usage: | |
| python3 yelp_business.py "https://www.yelp.com/biz/raw-sugar-factory-san-francisco" | |
| python3 yelp_business.py "https://www.yelp.com/biz/raw-sugar-factory-san-francisco" -o details.json | |
| Proxy: Set PROXY_URL in .env file (e.g., PROXY_URL=http://user:pass@host:port) | |
| """ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| Glassdoor Reviews Scraper using curl_cffi | |
| Scrapes company reviews from Glassdoor using their internal BFF API. | |
| Uses curl_cffi for browser TLS fingerprint impersonation to bypass Cloudflare. | |
| Supports multiple Glassdoor regional sites (co.in, sg, com, co.uk, etc.). | |
| Usage: | |
| python glassdoor_reviews_scraper.py --company Amazon --pages 3 | |
| python glassdoor_reviews_scraper.py --company Google --site com --sort date --rating 4 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| Glassdoor Job Scraper using curl_cffi | |
| Scrapes job listings and details from Glassdoor using their internal BFF API. | |
| Uses curl_cffi for browser TLS fingerprint impersonation to bypass Cloudflare. | |
| Supports multiple Glassdoor regional sites (co.in, sg, com, co.uk, etc.). | |
| Usage: | |
| python glassdoor_scraper.py --keyword "machine learning engineer" --location-id 2671300 --pages 3 | |
| python glassdoor_scraper.py --site sg -k "data scientist" -l 2671300 --sort date |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "nodes": [ | |
| { | |
| "parameters": { | |
| "rule": { | |
| "interval": [ | |
| {} | |
| ] | |
| } | |
| }, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "nodes": [ | |
| { | |
| "parameters": { | |
| "rule": { | |
| "interval": [ | |
| {} | |
| ] | |
| } | |
| }, |
NewerOlder