Skip to content

Instantly share code, notes, and snippets.

@cederigo
Created November 5, 2013 10:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cederigo/7317124 to your computer and use it in GitHub Desktop.
Save cederigo/7317124 to your computer and use it in GitHub Desktop.
crawl.js_config_experiment1
{
"fetcher": {
"instances": 10,
"request": {
"jar": true,
"timeout": 30000,
"followRedirect": true,
"maxRedirects": 3,
"headers": {
"User-Agent": "crawl.js v0.0.1"
}
}
},
"extractor": "parser",
"queues": {
"local": {
"type": "memory",
"options": {
"limit": 10000
}
},
"remote": {
"type": "redis",
"options": {
"flushInterval": 30000,
"host": "redis1",
"port": "6379"
}
}
},
"seen": {
"host": "redis1",
"port": "6379"
},
"dispatcher": {
"acceptPattern": "http://(ar|af|be).wikipedia.org/articles"
},
"store": {
"type": "dummy",
"options": {}
},
"url": {
"blocks": 2
},
"robo": {
"limit": 100
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment