Skip to content

Instantly share code, notes, and snippets.

@vshankar
Created August 30, 2011 08:44
Show Gist options
  • Save vshankar/1180481 to your computer and use it in GitHub Desktop.
Save vshankar/1180481 to your computer and use it in GitHub Desktop.
Hadoop Mountbroker Integration

Current working of Hadoop with GlusterFS

Currently to use Hadoop with GlusterFS, the Hadoop Map/Reduce daemons viz. TaskTracker and JobTracker needs to run as super-user. This is needed to be able to mount/unmount GlusterFS volume and access/modify data in it. On the contrary using Hadoop with HDFS had no such limitation. The daemons can run as any user and have full permission of the FS.

Mountbroker

The solution to the above case is solved by using Mounbroker. A detailed explanation of it's working in mentioned here https://gist.github.com/71ff8faa041425662185 In short, it allows an unprivileged process to own a GlusterFS mount. This is done by registering a label (and DSL options) with glusterd (via glusterd volfile). Then a mount request can be sent to glusterd from cli to get an alias (symlink) of the mounted volume. This alias is then sent as to umount the volume.

Hadoop Specific DSL

Mountbroker has pre-defines DSL for geo-replication. Pre-defining DSL for Hadoop specific options would be a good option.

Config DSL for Hadoop:

"SUP("
     "volfile-server=%s "
     "volfile-id=%s "
     "user-map-root=%s "
)"
"SUB+("
     "log-file="DEFAULT_LOG_FILE_DIRECTORY"/"GHADOOP"*/* "
     "log-level=* "
")"

glusterd options:

option mountbroker-root <path>
option mountbroker-glusterfs-hadoop.foo <volume>:<user>:<volfile-server>     # excluding <volfile-server> will use localhost as --volfile-server arg

This would require some additions in this patch http://review.gluster.com/#change,128

Integration

Once the above configurations are done, the user mounts the volume using

gluster>system:: mount foo user-map-root=<user> volfile-id=<volume> volfile-server=<volfile-server>
/mnt/mbr/mb_hive/mntUKSQlK

This mount alias goes in the Hadoop configuration file where the plugin does all I/O.

Patch

https://gist.github.com/1180603

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment