Skip to content

Instantly share code, notes, and snippets.

@karanth
Last active August 29, 2015 13:55
Show Gist options
  • Save karanth/8770716 to your computer and use it in GitHub Desktop.
Save karanth/8770716 to your computer and use it in GitHub Desktop.
Notes on running Hadoop jobs on Hortonworks sandbox.

The last [gist] (gist.github.com/karanth/8736340) was about installing the HortonWorks sandbox and getting to know the entry points into the sandbox. A next step for most people who are starting to learn Hadoop is to either run the example MapReduce (MR) jobs that come with the Hadoop distribution, or to write a simple MR job like word count.

The HUE web page at http://localhost:8888 does not allow for execution of Hadoop MR jobs from java programs, without the use of higher level abstractions like HIVE (SQL-like) or Pig. Hadoop MR jobs can be run by logging into the sandbox (recall Alt + F5 or ssh) and executing jobs on the sandbox's terminal.

####Running Hadoop Example Programs

The sandbox has the hadoop MR examples in the directory /usr/lib/hadoop-mapreduce. The file name is of the form, hadoop-mapreduce-examples-*.jar. * (asterisk) is the wildcard for the version details of the jar file.

To run an example, the pi estimation program in this case, the command is,

cd /usr/lib/hadoop-mapreduce
hadoop jar hadoop-mapreduce-examples-*.jar pi 10 10    

The first argument is the number of map jobs and the second, the number of samples per map job. The sandbox terminal already has the hadoop program in its path.

####Running External Hadoop Programs

If an external Hadoop program that is already available as a jar file, is to be executed in the sandbox,

  1. The jar file has to be placed in the sandbox just like the examples jar.
  2. The data files needed for the program have to be imported to HDFS.

The second requirement can be done in multiple ways and is covered in many tutorials online. The first requirement is a little tedious, particularly on a Windows client machine. Of course, the first requirement point is moot if the development happens in the sandbox. But in most cases, development happens either in the host machine or externally, and the sandbox is used as a test bed.

A *nix environment, the jar file can be imported into the sandbox using tools like sftp, ftp, scp etc. Windows client boxes do not come with these tools out of the box. Versions of these tools or *nix emulators like cygwin need to be installed.

The VirtualBox virtualization environment provides a facility for shared folders that can be used to share resources between the host (Win8 client) and the guest (sandbox). The host OS can setup a shared folder that can be mounted from within the sandbox. The jar file or other files can be placed in this shared folder on the host and can be accessed from the sandbox.

#####Setting up Shared Folders

######On the host:

  1. Go to the Devices menu item on the VirtualBox window
  2. Click on the Shared Folder Settings menu item
  3. Click on the Add folder + icon on the right hand side of the dialog.
  4. Choose Other in the dropdown
  5. Choose an appropriate path. For example, on my Win8 machine I created a folder C:\Users\<Username>\Documents\Shared and chose it.
  6. It is important to give a name and remember it. By default, the name of the shared folder is the folder name on the host.
  7. Check Auto Mount and Make Permanent checkboxes (optional)

######On the guest/sandbox:

  1. Make a directory where the folder would be mounted. For example, mkdir ~/host.
  2. Mount the shared folder using the name that was specified in the host. sudo mount -t vboxsf Shared ~/host
  3. Navigating to the host directory should give access to all the shared resources in the folder.

The external jar files can be placed in this shared folder and accessed within the guest/sandbox.

#####Notes:

  1. There could be multiple ways of sharing like USB drives or transfer programs, and this is just one way.
  2. Information about troubleshooting shared folders on VirtualBox can be found [here] (https://forums.virtualbox.org/viewtopic.php?t=15868).
  3. *nix guests are assumed here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment