The httpfs gateway is a proxy that allows users to read & write data using webhdfs. The benefit is httpfs behaves as a choke point by preventing access to the namenode and datanode ports. While looking into load issues for a single httpfs gateway, I started thinking why can't we bring up multiple httpfs gateways and put a layer7 load balancer in front?
To experiment, I downloaded an apache hadoop 2.7.1 binary package and followed the single node pseudo distributed node instructions to get a cluster running on my Mac. My next step was to add hadoop proxy settings for the httpfs gateway to core-site.xml. For example:
<property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property>