This environment is based on a nice set of Docker images from https://bitbucket.org/uhopper/hadoop-docker. They provide an older version of Hadoop (2.7.2), but it could be updated easily if needed.
In order to run this compose file, you first need to create a network called hadoop
:
% docker network create hadoop
This compose file also has a DNS proxy container which may be used to access the nodes in the Docker network by their names. The respective Dockerfile
is attached; run docker build -t netvl/dnsproxy:latest .
in the directory with that Dockerfile
. The DNS proxy can be used by adding its bridge network address to the resolv.conf
file, e.g. with resolvconf
(on Linux systems):
% resolvconf -a dnsproxy <<<'nameserver 172.19.0.2' # or whatever address the container gets
If you do not need a DNS proxy, just remove the respective section from the compose file. Although it may be difficult to access the web UIs without it. If you decide to keep the DNS proxy and if your host system is configured, you will be able to access the Hadoop cluster nodes by their hostnames as they are declared in the compose file, e.g. http://resourcemanager.hadoop:8088.
You may want to run this compose file on a proper Linux machine rather than using Docker for Mac/Windows, because all Hadoop processes take a large amount of memory and because networking in non-native environments works really bad.
When you run the compose file with docker-compose up
, make sure that you have the hadoop.env
file in the same directory. This file contains various important configuration options which get written into Hadoop configuration files.
You can add more datanodes to the cluster by copy-pasting the respective section in the compose file.
To access the cluster, run the uhopper/hadoop
image in the same network and with the same environment file:
% docker run --rm -it --network hadoop --env-file hadoop.env uhopper/hadoop bash