Skip to content

Instantly share code, notes, and snippets.

@full-of-foo
Created February 7, 2016 15:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save full-of-foo/8622ef64828cbcd78711 to your computer and use it in GitHub Desktop.
Save full-of-foo/8622ef64828cbcd78711 to your computer and use it in GitHub Desktop.
Stand-alone Hadoop Container
# Single cluster Hadoop tutorial
# - https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
FROM ubuntu:14.04
MAINTAINER Anthony Troy
# Linux deps
ENV DEBIAN_FRONTEND noninteractive
RUN sed -i 's/# \(.*multiverse$\)/\1/g' /etc/apt/sources.list
RUN apt-get update
RUN apt-get install -y build-essential
RUN apt-get install -y software-properties-common
RUN apt-get install -y wget rsync ssh
# Sshd config
RUN ssh-keygen -t rsa -N "" -f /root/.ssh/id_rsa
RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
RUN mkdir -p /var/run/sshd
RUN echo "Host localhost\n UserKnownHostsFile=/dev/null\n StrictHostKeyChecking=no" >> /root/.ssh/config
# Java dep
RUN echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | debconf-set-selections
RUN add-apt-repository -y ppa:webupd8team/java
RUN apt-get update
RUN apt-get install -y oracle-java8-installer
ENV JAVA_HOME /usr/lib/jvm/java-8-oracle
# Hadoop
RUN wget http://ftp.heanet.ie/mirrors/www.apache.org/dist/hadoop/common/stable/hadoop-2.7.2.tar.gz
RUN tar xzf hadoop-2.7.2.tar.gz
RUN mv hadoop-2.7.2 /usr/local/hadoop
ENV HADOOP_HOME /usr/local/hadoop
ENV PATH $PATH:$HADOOP_HOME/bin
# Cleanup
RUN rm -rf hadoop-2.7.2.tar.gz
RUN rm -rf /var/lib/apt/lists/*
RUN rm -rf /var/cache/oracle-jdk8-installer
CMD service ssh start
#!/bin/bash
##************************************************************#
# test.sh #
# #
# Build and run the container #
# docker build -t hadoop_test . && docker run hadoop_test #
# #
# Run this script on the container #
# docker run hadoop_test bash -c "`cat test.sh`" #
# #
#*************************************************************#
COUNT_FILES="${COUNT_FILES:-$(find / -name LICENSE.txt)}"
OUTPUT_DIR="${OUTPUT_DIR:-test_output)}"
rm -fr $OUTPUT_DIR
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop*examples*.jar wordcount $COUNT_FILES $OUTPUT_DIR
cat $OUTPUT_DIR/part-r-00000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment