Skip to content

Instantly share code, notes, and snippets.

@whatvn
Created August 19, 2014 07:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save whatvn/d54f038fbdebb5ff0607 to your computer and use it in GitHub Desktop.
Save whatvn/d54f038fbdebb5ff0607 to your computer and use it in GitHub Desktop.
String seedUrl = "www.google.com www.blogger.com";
_logger.info("Processing url [{}]", seedUrl);
Random random = new Random();
String batchId = String.valueOf(random.nextInt());
Configuration nutchConfiguration = NutchConfiguration.create();
nutchConfiguration.set(BATCH_ID, batchId);
String solrUrl = nutchConfiguration.get(SOLR_URL);
String crawlArgs = String.format("-seedurl %s -depth 5 -topN 10", seedUrl);
// Run Crawl tool
ToolRunner.run(nutchConfiguration, new Crawler(),
tokenize(crawlArgs));
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment