Skip to content

Instantly share code, notes, and snippets.

Sergey Areks

Block or report user

Report or block Areks

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@Areks
Areks / dabblet.css
Created Feb 6, 2015
The first commented line is your dabblet’s title
View dabblet.css
/**
* The first commented line is your dabblet’s title
*/
background: #000;
@Areks
Areks / dabblet.css
Created Feb 6, 2015 — forked from anonymous/dabblet.css
The first commented line is your dabblet’s title
View dabblet.css
/**
* The first commented line is your dabblet’s title
*/
background: #000;
View setup.md

Info

This guide sets up a non-clustered Nutch crawler, which stores its data via HBase. We will not learn how to setup Hadoop et al., but just the bare minimum to crawl and index websites on a single machine.

Terms

  • Nutch - the crawler (fetches and parses websites)
  • HBase - filesystem storage for Nutch (Hadoop component, basically)
You can’t perform that action at this time.