Skip to content

Instantly share code, notes, and snippets.

@karanth
Last active April 26, 2021 16:47
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save karanth/8736340 to your computer and use it in GitHub Desktop.
Save karanth/8736340 to your computer and use it in GitHub Desktop.
Notes on installing Hortonworks Hadoop Sandbox - I

Installing a single node hadoop cluster is not a straight forward task. It involves a bunch of different things like creating users and groups to enabling password-less ssh. Thanks to virtualization technology and hortonworks' pre-configured OS images with Hadoop and a few of its ecosystem components, the task has been greatly simplified. Though this does not enable a first time Hadoop user to learn about the system level Hadoop complexities, it simplifies administration and deployment. The user can now focus on data management and analysis.

Downloads

The 2.4GB image for the Hortonworks Hadoop sandbox can be downloaded from [here] (http://hortonworks.com/products/hortonworks-sandbox/#install). I have chosen Oracle's VirtualBox as the virtualization technology. It can be downloaded from [here] (https://www.virtualbox.org/wiki/Downloads)

Configuration

I have tried installing VirtualBox on my Windows 8 PC, that has 4GB of RAM. The documentation clearly states that if Ambari and/or HBase have to be enabled and used, a machine with atleast 8GB of RAM is required.

When importing the sandbox image into VirtualBox, the guest OS is allocated 2GB of RAM. As more RAM is given to the guest OS, the faster it performs. Hortonworks recommends atleast 4GB of RAM for the guest OS. The guest OS image is a CentOS 64-bit Operating System.

Importing the image gave me an error, VT-X disabled in the BIOS. The error message suggests that virtualization support is non-existant or disabled on my processor. The securable [tool] (https://www.grc.com/securable.htm) can be downloaded and executed to check if your processor supports virtualization.

If virtualization is supported, the BIOS on the PC would have an option to turn it on. It has to be enabled and saved.

Start

When the sandbox boots up, a lot of services are started, including but not limited to - Hadoop namenodes, Hive, Pig, Oozie and supporting database servers.

The sandbox also supports an advanced UI for Hadoop called HUE. HUE can be accessed on the host machine browser at address http://127.0.0.1:8888.

HUE allows a user to run Hive or Pig scripts, import and export data, do user administration for hadoop etc.

With keys Alt + F5, or using ssh, a user is allowed to login into the sandbox.

The username and password for login are root and hadoop.

Once logged in, the different processes can be examined by the ps command. The hadoop conf directory is at /etc/hadoop/conf. The examination of the hdfs-site.xml in the conf directory gives a clue as to the hdfs location on the disk of the sandbox. The sandbox has the hdfs location at /hadoop/hdfs/data.

Will be dissecting the sandbox a little bit more in the next post.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment