Skip to content

Instantly share code, notes, and snippets.

@isaacarnault
Last active September 23, 2020 09:13
Show Gist options
  • Save isaacarnault/dea12dc37b2e2caf565ff393ba47ebcb to your computer and use it in GitHub Desktop.
Save isaacarnault/dea12dc37b2e2caf565ff393ba47ebcb to your computer and use it in GitHub Desktop.
Deploying a Hadoop cluster for Test purposes using AWS EC2, Docker and Cloudera
________ ________ ___ __ ___
|\_____ \|\ __ \|\ \|\ \ |\ \
\|___/ /\ \ \|\ \ \ \/ /|\ \ \
/ / /\ \ __ \ \ ___ \ \ \
/ /_/__\ \ \ \ \ \ \\ \ \ \ \
|\________\ \__\ \__\ \__\\ \__\ \__\
\|_______|\|__|\|__|\|__| \|__|\|__|
Hortonworks, BigInsights, MapR.
Cloudera server installation.

Deploying a Hadoop cluster for Test purposes using AWS EC2, Docker and Cloudera

Project Status: Concept – Minimal or no implementation has been done yet, or the repository is only intended to be a limited example, demo, or proof-of-concept.

What you need to complete this installation

A. Cloud platform: 1 AWS account

B. Tools used: 1 EC2 instance on AWS (Ubuntu 18.04 LTS)

C. Containerization: 1 Docker image (Cloudera Quickstart)

D. Programming language: Bourne Shell (bash)


Some of you asked me to make a gist that helps beginners with Hadoop.

Is Hadoop going to die as many claim?

If yes, then let's run a Hadoop cluster before it's too late :)!

This gist will help you launch a Hadoop cluster easily.

We'll be using AWS as Compute and Storage platform.

We'll also use Docker in order to launch Cloudera QuickStart.

At the end of this gist, you'll have a Hadoop cluster up and running for basic purposes.

I recommend you to use a regular or enterprise version of Cloudera for dev and prod purposes.


Before you start

Create an account on AWS and log into AWS Management Console.
Check PREREQUISITES section of this gist


This section must be considered before taking the README.md section of this gist.

We'll first set up a security group, a user and assign an IAM role before proceeding to the concrete installation of Hadoop.

PREREQUISITES

Steps to be covered: 3

Setting up a Security Group

Creating a User and a Group Group

Assigning an IAM role

Go to Services > EC2, in NETWORK AND SECURITY, click on Security Groups > Create Security Group

Security group name: Hadoop

Description: Hadoop-Admins-SG

VPC: select default VPC

Security Group Rules (Inbound and Outbound): allow SSH, HTTP, HTTPS from anywhere.

Click on Create.

πŸ”΄ See configuration

isaac-arnault-AWS.png

Go to Services, in Security, Identity and Compliance section, click on IAM.

Click on Users > Add user and configure as follows:

πŸ”΄ See configuration

isaac-arnault-aws-19.png

Click on Next: Permissions > Add user to group > Create group > Group Name: hadoop_admins

Search for EC2: select AmazonEC2FullAccess, Search for IAM: select AmazonIAMFullAccess

πŸ”΄ See configuration

isaac-arnault-AWS-20.png

πŸ”΄ See configuration

isaac-arnault-aws-21.png

In IAM go to Roles > Create role > click on EC2 > Next: Permissions > select AdministratorAccess

πŸ”΄ See configuration

isaac-arnault-aws-22.png

Key: name > Value: hadoop-cluster > Next: Review > Role name: AdminAccess > Create role. By clicking on IAM, you can have a summary of the role you've created.

πŸ”΄ See configuration

isaac-arnault-AWS-23.png

At this stage you should have a user, a group and a role attached to your AWS account before proceeding to step 2.


Please note
: having all check marks on IAM green is great, but it is not mandatory by AWS.

πŸ”΄ See configuration

isaac-arnault-AWS-18.png

2. INSTALLATION

Steps to be covered: 3

Setting up our EC2 instance

Pulling a Cloudera Quickstart Docker Image

Starting the services

Go to Services > EC2, click on Launch Instance.

Select Ubuntu server 18.04 LTS as AMI.

πŸ”΄ See configuration

isaac-arnault-AWS-hadoop.png

Choose a t2.xlarge instance type. Choosing a lower instance may lead to latency.

πŸ”΄ See configuration

isaac-arnault-hadoop-2.png

Click on Configure Instance Details and tune as follows:

Number of instances: 1 > IAM role: AdminAccess > Next: Add Storage, set storage size to 30 Gibibytes.

πŸ”΄ See configuration

isaac-arnault-aws-24.png

Next: Add tags > Key: name, Value: hadoop-cluster > Next: Configure Security Group > select an existing security group:

choose the one you've created with the above commands. You can also select your default security group.

Review and Launch > Launch.

You'll be prompted by AWS to create a Key Pair file, create a new key pair file and Download it.

Save it on a repository called hadoop:

mkdir hadoop

Go to Services > EC2, wait for your instance to be running and for the health checks to pass.

When your instance is running, select your instance name, and click "Connect".

Copy the link provided by the EC2 instance and use it in your Terminal:

ssh -i "MyKeyPairFile.pem" ubuntu@ec2-*-*-*-*.compute-1.amazonaws.com

Open your Terminal and go the the repository where you've stored the Key Pair file.

Perform as follows:

chmod 400 MyKeyPairFile.pem

Now execute the given ssh command by your EC2 instance:

ssh -i "MyKeyPairFile.pem" ubuntu@ec2-3-90-136-245.compute-1.amazonaws.com

You are now logged into your EC2 instance's terminal and ready to install Docker and Cloudera Quickstart.

sudo apt-get remove docker docker-engine docker.io
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88
sudo add-apt-repository \
  "deb [arch=amd64] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) \ stable"
sudo apt-get update
apt-cache madison docker-ce
sudo apt install docker.io
sudo systemctl start docker
sudo systemctl enable docker
docker --version
sudo su
docker images
docker ps
docker pull cloudera/quickstart:latest
docker run -m 4G --memory-reservation 2G --memory-swap 8G --hostname=quickstart.cloudera --privileged=true -t -i -v $(pwd):/CDH --publish-all=true -p8888 -p8088 cloudera/quickstart /usr/bin/docker-quickstart
πŸ”΅ See output

isaac-arnault-AWS-24.png

If all services are launched on your EC2 Terminal, open your web browser and type the following :

my-EC2-instance-DNS:32768

You should land to the login form, use cloudera / cloudera as login and password.

Here you go! You can now start using Hadoop for testing purposes.

πŸ”΅ See output

isaac-arnault-AWS-cloudera.png

my-EC2-instance-DNS:32769 for cluster overview

πŸ”΅ See output

isaac-arnault-hadoop-cloudera.png

You can install other applications directly from the panel and have your cluster ready for action!


isaac-arnault-cloudera-CDH.png


Author

  • Isaac Arnault - Helping devs install Hadoop in a more effective way, cheaply, effortlessly and timelessly.
MIT License
Copyright (c) 2019 Isaac Arnault
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment