Skip to content

Instantly share code, notes, and snippets.

@athap
Created October 14, 2015 15:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save athap/304257b4a32e5490425a to your computer and use it in GitHub Desktop.
Save athap/304257b4a32e5490425a to your computer and use it in GitHub Desktop.
scraper options
./script/scraper --help
Usage: scraper [options] [sources ...]
Specific Options:
-l, --limit N Limit operations to N listings.
-t, --throttle N Random amount of time to throttle between url gets.
-c, --command [COMMAND] Which types of scraping command you want to run (all harvest collect scrape rescrape all_agents new_agents fix_agents quick_harvest_and_close quick_close validate)
-s, --sourceid id Scrapes a single listing using its source id.
-o, --office office_key(s) Scrapes office's listings using its office key (for MRIS and ListHub scrapers). Can be comma-delimited.
-e, --env [ENVIRONMENT] Which types of evironment you want to run in (prod gamma beta dev)
-p, --proxy_list [DATE] Specify a file with a list of proxies to use.
-a, --date [date] Use cached data from this date. Must be in a format that Chronic.parse can understand (ex. '2007-10-21')
-n, --no_validation Don't validate
--hourly run the hourly scraper
--rpt [timeframe] Only runs if realplus listings found updated in given timeframe, defaulting to '48 hours ago' - Chronic.parse-able format required
--skip-images Skip image downloading this run
--populate Run SourceGroup.populate this time
--touch Touch untouched listings anyway
-y, --yaml dump changes to YAML
-d, --debug Debug mode
-v, --verbose Verbose mode
-q, --quiet Turn off verbose mode
--skip-writing Skip the writing step (validate only)
-x, --do_not_close_listings Don't close any listings
-h, --help Show this message
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment