Skip to content

Instantly share code, notes, and snippets.

Avatar

shivakyasaram

View GitHub Profile
@xrstf
xrstf / setup.md
Last active Feb 27, 2021
Nutch 2.3 + ElasticSearch 1.4 + HBase 0.94 Setup
View setup.md

Info

This guide sets up a non-clustered Nutch crawler, which stores its data via HBase. We will not learn how to setup Hadoop et al., but just the bare minimum to crawl and index websites on a single machine.

Terms

  • Nutch - the crawler (fetches and parses websites)
  • HBase - filesystem storage for Nutch (Hadoop component, basically)