Skip to content

Instantly share code, notes, and snippets.

Crawl.js - Experiment 3

  • Place: Opennebula cluster unine.ch
  • Date: 11.5.2013

Setup

  • Wikipedia-languages: ar,af,be
  • worker-vms: 2
  • crawlers (url-blocks): 4
  • hashing: simple (md5 on whole url)
  • virtual latency: none
@cederigo
cederigo / config.json
Created November 5, 2013 10:40
crawl.js_config_experiment1
{
"fetcher": {
"instances": 10,
"request": {
"jar": true,
"timeout": 30000,
"followRedirect": true,
"maxRedirects": 3,
"headers": {
"User-Agent": "crawl.js v0.0.1"
@cederigo
cederigo / plates.expected.html
Created March 12, 2012 22:18
plates nested objects
<div id="content">
<div class="date">Mon Mar 12 2012 23:08:16 GMT+0100 (CET)</div>
<div class="host">
<div class="name">Poland</div>
</div>
<div class="guest">
<div class="name">Greece</div>
</div>
</div>