Skip to content

Instantly share code, notes, and snippets.

@sbolel
Last active August 6, 2019 02:53
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save sbolel/158b9832a9fa263d1b903fb6aeff4155 to your computer and use it in GitHub Desktop.
Save sbolel/158b9832a9fa263d1b903fb6aeff4155 to your computer and use it in GitHub Desktop.
Guide: Set up a DC/OS Mesos cluster on Google Compute Engine

Guide: Deploy a DC/OS Mesos Cluster on Google Compute Engine

DC/OS sets up a cluster and deploys pre-configured components services needed to complete a task on hand. You don’t have to entirely understand the complexity of the infrastructure and how to set it up, DC/OS helps you creating necessary abstractions. Once complete, you will have a running cluster with interactive research notebook (container of Jupyter Python Notebook with Apache Spark) and distributed file system (HDFS), ready to tackle any large-scale data processing task.

Getting Started

This guide explains the process for setting up a DC/OS Mesos cluster on Google Cloud Platform.

Steps:

  • Create dcos-bootstrap CE instance to manage DC/OS installations
  • Create a Mesos cluster consisting of 3 master nodes and 5 agent nodes
  • Enable HDFS on the cluster
  • Install Spark
  • Install Spark Notebook

Installing DC/OS on Google Compute Engine

Guide: https://dcos.io/docs/1.8/administration/installing/cloud/gce/

See configuration details at the end of this document

After completing the setup of the cluster using the bootstrap instance, you'll see the generated master and agent nodes in Google Cloud Console. In order to interact with DC/OS via the UI or CLI, you'll need to enable webserver (HTTP/HTTPS) on your leader master node. You'll also need to assign a static IP address to the master leader node in order to access and use the DC/OS dashboard UI.

Enable HTTP/HTTPS

In the Google Cloud Console for Compute Engine, navigate to the settings for the leader master. Click "Edit" and check the boxes for HTTP and HTTPS under the "Firewall" section, then save your changes.

Add SSH keys to master instances
  • bootstrap node ssh public key
  • your machines ssh public key
Set Static IP

Next, assign a static IP address to the master leader node by creating a static IP in the networking settings of Google Cloud Platform for the project.

Installing the DC/OS CLI on your local machine

Guide: https://docs.mesosphere.com/1.8/usage/cli/install/

Bootstrap repo: https://github.com/sbolel/dcos-gce

After setting up the DC/OS cluster on Google, install the DC/OS CLI on your local machine. Then, login and set the IP urls of dcos and mesos. The dcos_url corresponds to the public IP address of the master leader node.

# login to dcos using Google OAuth
dcos auth login

# set `dcos_url` to public external IP of master leader node
dcos config set core.dcos_url http://150.155.55.250

# set `mesos_master_url` to internal IP of master leader node
dcos config set core.mesos_master_url http://10.128.0.3:5050

Adding DC/OS Services

https://docs.mesosphere.com/1.8/usage/service-guides/spark/install/

  • dcos package install hdfs
  • dcos package install spark

Resources

  • DC/OS
  • Mesos
  • Compute Engine

Example DC/OS Configuration

This sets up a cluster with the following nodes:

  • 3 master nodes with 4-core CPU (n1-standard-4) and SSD (64 GB)
  • 6 slave nodes with 2-cores CPU (n1-standard-2) and standard disks (64 GB)
~/dcos-gce/hosts
[masters]
master0 ip=10.128.0.3
master1 ip=10.128.0.4
master2 ip=10.128.0.5

[agents]
agent[0000:9999]

[bootstrap]
dcos-bootstrap
~/dcos-gce/group_vars/all
project: <project-name>
subnet: default
login_name: <your-login-name>
bootstrap_ public_ip: 10.128.0.2
zone: us-central1-c

master_boot_disk_size: 64
master_machine_type: n1-standard-4
master_boot_disk_type: pd-ssd

agent_boot_disk_size: 64
agent_machine_type: n1-standard-2
agent_boot_disk_type: pd-standard
agent_instance_type: "MIGRATE"
agent_type: private
start_id: 0001
end_id: 0006

gcloudbin: /usr/bin/gcloud
image: '/centos-cloud/centos-7-v20161027'
bootstrap_public_port: 8080
cluster_name: <new_cluster_name>
scopes: "default=https://www.googleapis.com/auth/cloud-platform"
dcos_installer_filename: dcos_generate_config.sh
dcos_installer_download_path: "https://downloads.dcos.io/dcos/stable/{{ dcos_installer_filename }}"
home_directory: "/home/{{ login_name }}"
downloads_from_bootstrap: 2
dcos_bootstrap_container: dcosinstaller

DCOS on Google Compute Engine

Original source: dcos-labs: DCOS on Google Compute Engine

This repository contains scripts to configure a DC/OS cluster on Google Compute Engine.

A bootstrap node is required to run the scripts and to bootstrap the DC/OS cluster.

PLEASE READ THE ENTIRE DOCUMENT. YOU MUST MAKE CHANGES FOR THE SCRIPTS TO WORK IN YOUR GCE ENVIRONMENT.

##Bootstrap node configuration

YOU MUST CREATE A PROJECT using the google cloud console. The author created a project called trek-treckr

You can create the bootstrap node using the google cloud console. The author used a n1-standard-1 instance running centos 7 with a 10 GB persistent disk in zone europe-west1-c. The bootstrap node must have "Allow full access to all Cloud APIs" in the Identity and API access section. Also enable Block project-wide SSH keys in the SSH Keys section. Create the instance.

After creating the boot instance run the following from the shell


Note: Delta RPMs disabled because /usr/bin/applydeltarpm not installed. $ sudo yum install deltarpm


sudo yum update google-cloud-sdk &&
sudo yum update &&
sudo yum install epel-release &&
sudo yum install python-pip &&
sudo pip install -U pip &&
sudo pip install 'apache-libcloud==1.2.1' &&
sudo pip install 'docker-py==1.9.0' &&
sudo yum install git ansible

You need to create the rsa public/private keypairs to allow passwordless logins via SSH to the nodes of the DC/OS cluster. This is required by ansible to create the cluster nodes and install DC/OS on the nodes.

Run the following to generate the keys

ssh-keygen -t rsa -f ~/.ssh/id_rsa -C ajazam

PLEASE REPLACE ajazam with your username. Do not eneter a password when prompted

Make a backup copy of id_rsa.pub.

Open rsa pub key

sudo vi ~/.ssh/id_rsa.pub

shows

ssh-rsa abcdefghijklmaasnsknsdjfsdfjs;dfj;sdflkjsd ajazam

Prefix your username, followed by a colon, to the above line. Also replace ajazam at the end with your username.

ajazam:ssh-rsa abcdefghijklmaasnsknsdjfsdfjs;dfj;sdflkjsd ajazam

save contents of id_rsa.pub. Please replace the ajazam with your username.

Add the rsa public key to your project

chmod 400 ~/.ssh/id_rsa
gcloud compute project-info add-metadata --metadata-from-file sshKeys=~/.ssh/id_rsa.pub

Disable selinux for docker to work

make the following change to /etc/selinux/config

SELINUX=disabled

reboot host

To install docker add the yum repo

sudo tee /etc/yum.repos.d/docker.repo <<-'EOF'
[dockerrepo]
name=Docker Repository
baseurl=https://yum.dockerproject.org/repo/main/centos/7/
enabled=1
gpgcheck=1
gpgkey=https://yum.dockerproject.org/gpg
EOF

install the docker package

sudo yum install docker-engine-1.11.2

Add following changes to /usr/lib/systemd/system/docker.service

ExecStart=/usr/bin/docker daemon --storage-driver=overlay

reload systemd

sudo systemctl daemon-reload

Start docker

sudo systemctl start docker.service

Verify if docker works

sudo docker run hello-world

download the dcos-gce scripts

git clone https://github.com/dcos-labs/dcos-gce

change directory

cd dcos-gce

Please make appropriate changes to dcos_gce/group_vars/all. You need to review project, subnet, login_name, bootstrap_public_ip & zone

insert following into ~/.ansible.cfg to stop host key checking

[defaults]
host_key_checking = False

[paramiko_connection]
record_host_keys = False

[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=/dev/null

Ensure the IP address for master0 in ./hosts is the next consecutive IP from bootstrap_public_ip.

To create and configure the master nodes run

ansible-playbook -i hosts install.yml

To create and configure the private nodes run

ansible-playbook -i hosts add_agents.yml --extra-vars "start_id=0001 end_id=0002 agent_type=private"

start_id=0001 and end_id=0002 specify the range of id's that are appended to the hostname "agent" to create unique agent names. If start_id is not specified then a default of 0001 is used. If the end_id is not specified then a default of 0001 is used.

When specifying start_id or end_id via CLI, the leading zeroes must be dropped for any agent id higher than 7 or ansible will throw a format error.

ansible-playbook -i hosts add_agents.yml --extra-vars "start_id=0006 end_id=10 agent_type=private"

The values for agent_type are either private or public. If an agent_type is not specified then it is assumed agent_type is private.

To create public nodes type

ansible-playbook -i hosts add_agents.yml --extra-vars "start_id=0003 end_id=0004 agent_type=public"

##Configurable parameters

File './hosts' is an ansible inventory file. Text wrapped by [] represents a group name and individual entries after the group name represent hosts in that group. The [masters] group contains node names and IP addresses for the master nodes. In the supplied file the host name is master0 and the ip address 10.132.0.3 is assigned to master0. YOU MUST CHANGE the IP address for master0 for your network. You can create multiple entries e.g. master1, master2 etc. Each node must have a unique IP address.

The [agents] group has one entry. It specifies the names of all the agents one can have in the DC/OS cluster. The value specifies that agent0000 to agent9999, a total of 10,000 agents are allowed. This really is an artificial limit because it can easily be changed.

The [bootstrap] group has the name of the bootstrap node.

File './group_vars/all' contains miscellaneous parameters that will change the behaviour of the installation scripts. The parameters are split into two groups. Group 1 parameters must be changed to reflect your environment. Group 2 parameters can optionally be changed to change the behaviour of the scripts.

###Group 1 parameters YOU MUST CHANGE for your environment

project

Your project id. Default: trek-trackr

subnet

Your network. Default: default

login_name

The login name used for accessing each GCE instance. Default: ajazam

bootstrap_public_ip

The bootstrap nodes public IP. Default: 10.132.0.2

zone

You may change this to your preferred zone. Default: europe-west1-c

###Group 2 parameters which optionally change the behaviour of the installation scripts

master_boot_disk_size:

The size of the master node boot disk. Default 10 GB

master_machine_type

The GCE instance type used for the master nodes. Default: n1-standard-2

master_boot_disk_type

The master boot disk type. Default: pd-standard

agent_boot_disk_size

The size of the agent boot disk. Default 10 GB

agent_machine_type

The GCE instance type used for the agent nodes. Default: n1-standard-2

agent_boot_disk_type

The agent boot disk type. Default: pd-standard

agent_instance_type

Allows agents to be preemptible. If the value is "MIGRATE" then they are not preemptible. If the value is '"TERMINATE" --preemptible' then the instance is preemptible. Default: "MIGRATE"

agent_type

Can specify whether an agent is "public" or "private". Default: "private"

start_id

The number appended to the text agent is used to define the hostname of the first agent. e.g. agent0001. Intermediate agents between start_id and end_id will be created if required. Default: 0001

end_id

The number appended to the text agent is used to define the hostname of the last agent. e.g. agent0001. Intermediate agents between start_id and end_id will be created if required. Default: 0001

gcloudbin

The location of the gcloudbin binary. Default: /usr/local/bin/gcloud

image

The disk image used on the master and agent. Default: /centos-cloud/centos-7-v20161027

bootstrap_public_port

The port on the bootstrap node which is used to fetch the dcos installer from each of the master and agent nodes. Default: 8080

cluster_name

The name of the DC/OS cluster. Default: cluster_name

scopes

Don't change this. Required by the google cloud SDK

dcos_installer_filename

The filename for the DC/OS installer. Default dcos_generate_config.sh

dcos_installer_download_path

The location of where the dcos installer is available from dcos.io. Default: https://downloads.dcos.io/dcos/stable/{{ dcos_installer_filename }} The value of {{ dcos_installer_file }} is described above.

home_directory

The home directory for your logins. Default: /home/{{ login_name }} The value of {{ login_name }} is described above.

downloads_from_bootstrap

The concurrent downloads of the dcos installer to the cluster of master and agent nodes. You may need to experiment with this to get the best performance. The performance will be a function of the machine type used for the bootstrap node. Default: 2

dcos_bootstrap_container

Holds the name of the dcos bootstrap container running on the bootstrap node. Default: dcosinstaller

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment