Skip to content

Instantly share code, notes, and snippets.

@cantbewong
Last active August 30, 2020 09:43
Show Gist options
  • Save cantbewong/38a4f5dc8c78b17c9ca8881e00310498 to your computer and use it in GitHub Desktop.
Save cantbewong/38a4f5dc8c78b17c9ca8881e00310498 to your computer and use it in GitHub Desktop.
Install DC/OS on bare metal

Install DC/OS on bare metal

Assume 5 physical nodes, or VMs, that will be used with ScaleIO storage

  • All nodes have 2 CPU cores, 64GB of disk storage. Hardware or VM type needs to support CentOS 7.3.
  • The Install bootstrap node needs 4GB of memory, Other node types might get by with 2GB, though the spec calls for more (32GB on Master and Boot node, 16GB on others), and 2GB is cutting it close on the Master node for sure.
  • Assume a single NIC each, All on a common subnet - though other configurations may work
  • Each node must have a hostname in DNS, with forward and reverse lookup working, DHCP is OK

This process is suitable for training and testing, but not suitable for heavy workloads or enterprise grade production deployments. This is specifically intended for on "on premises" deployment to "bare metal" or hypervisor. Easier deployment processes are available for running DC/OS in many of the popular public clouds. For production deployments, contacting Mesosphere Inc. for a subscription version, including security features and support is recommended.

ScaleIO is a software defined storage solution that provides block based storage (what you want for high performance stateful containerized apps such as databases), from commidity x86 servers. It can be deployed with DC/OS in a converged infrastructure, where ScaleIO is installed on the same nodes as the DC/OS agents which run containers. However, in the process described below, a non-converged ScaleIO deployment is assumed to be already deployed. ScaleIO binaries are available for free download here. You will use only the client package (EMC-ScaleIO-sds-2.0-5014.0.el7.x86_64.rpm) in the process described.

Install Centos 7.3 on all nodes (BOOT, MASTER, 2 AGENTS, 1 PUBLIC AGENT)

  1. use default centos disk format = xfs
  2. enable IPV4
  3. set timezone, with ntp (default)

On Mesos Boot Node (installer)

generate ssh key for root

ssh-keygen -t rsa

copy public to targets (all masters and agents) - substitute your actual ips

cat ~/.ssh/id_rsa.pub | ssh root@192.168.1.21 "mkdir ~/.ssh && cat >> ~/.ssh/authorized_keys"
cat ~/.ssh/id_rsa.pub | ssh root@192.168.1.22 "mkdir ~/.ssh && cat >> ~/.ssh/authorized_keys"
cat ~/.ssh/id_rsa.pub | ssh root@192.168.1.23 "mkdir ~/.ssh && cat >> ~/.ssh/authorized_keys"
cat ~/.ssh/id_rsa.pub | ssh root@192.168.1.24 "mkdir ~/.ssh && cat >> ~/.ssh/authorized_keys"

on all nodes (BOOT, MASTER, 2 AGENTS, 1 PUBLIC AGENT)

As an option, a tool that supports multiple concurent console sessions such as tmux could be useful for efficiently performing these steps that are common to multiple nodes.

Login as root

visudo
  1. uncomment # %wheel ALL=(ALL) NOPASSWD: ALL
  2. comment out the other existing activated %wheel line

Add a non-root user

adduser centos
passwd centos
usermod -aG wheel centos
usermod -aG docker centos

optional, login as this user (centos) and generate an ssh key set for convenience

copy public to targets (all masters and agents) - substitute your actual ips

cat ~/.ssh/id_rsa.pub | ssh centos@192.168.1.21 "mkdir ~/.ssh && cat >> ~/.ssh/authorized_keys"

Address some Docker related items

vi /etc/default/grub

add ipv6.disable=1 in GRUB_CMDLINE_LINUX definition

stop firewall

sudo systemctl stop firewalld && sudo systemctl disable firewalld

enable OverlayFS

sudo tee /etc/modules-load.d/overlay.conf <<-'EOF'
overlay
EOF

disable SELINUX

sudo sed -i s/SELINUX=enforcing/SELINUX=permissive/g /etc/selinux/config &&
  sudo groupadd nogroup

reload kernel modules

reboot

yum install -y nano ntp tar xz unzip curl ipset open-vm-tools nfs-utils yum-versionlock
chkconfig ntpd on
service ntpd restart
systemctl enable ntpd
yum -y update

define Docker's repo

sudo tee /etc/yum.repos.d/docker.repo <<-'EOF'
[dockerrepo]
name=Docker Repository
baseurl=https://yum.dockerproject.org/repo/main/centos/$releasever/
enabled=1
gpgcheck=1
gpgkey=https://yum.dockerproject.org/gpg
EOF

Configure systemd to run the Docker Daemon with OverlayFS:

sudo mkdir -p /etc/systemd/system/docker.service.d && sudo tee /etc/systemd/system/docker.service.d/override.conf <<- EOF
[Service]ExecStart=ExecStart=/usr/bin/dockerd --storage-driver=overlay
EOF

install Docker 1.13.1 from Docker's repo (latest version supported with DC/OS 1.9)

sudo yum install -y docker-engine-1.13.1 docker-engine-selinux-1.13.1
yum versionlock docker-engine docker-engine-selinux
sudo systemctl start docker
sudo systemctl enable docker

test docker

sudo docker ps
docker run hello-world

On Mesos Boot Node (installer)

docker pull nginx:alpine
docker run -d --restart=unless-stopped -p 8081:80 -v /opt/dcos-setup/genconf/serve:/usr/share/nginx/html:ro --name=dcos-bootstrap-nginx nginx:alpine

download DC/OS installer

curl -O https://downloads.dcos.io/dcos/stable/dcos_generate_config.sh
sudo bash dcos_generate_config.sh —web
mkdir -p genconf

Create an ip detect script named /root/genconf/ip-detect:

cat << EOF > /root/genconf/ip-detect
#!/usr/bin/env bash
set -o nounset -o errexit -o pipefail
export PATH=/sbin:/usr/sbin:/bin:/usr/bin:$PATH
MASTER_IP=${MASTER_IP:-8.8.8.8}
INTERFACE_IP=$(ip r g ${MASTER_IP} | \
awk -v master_ip=${MASTER_IP} '
BEGIN { ec = 1 }
{
  if($1 == master_ip) {
    print $7
    ec = 0
  } else if($1 == "local") {
    print $6
    ec = 0
  }
  if (ec == 0) exit;
}
END { exit ec }
')
echo $INTERFACE_IP
EOF

(Option #1) Launch the DC/OS web installer in your browser at: http://<bootstrap-node-public-ip>:9000

  1. Run bash dcos_generate_config.sh --web -v
  2. In browser, openhttp://<bootstrap-node-public-ip>:9000

Web installer will compose a /root/genconf/config.yaml file which drives the install process

example the auto-generated content of /root/genconf/config.yaml:

---
agent_list:
- 192.168.1.22
- 192.168.1.23
bootstrap_url: file:///opt/dcos_install_tmp
cluster_name: DC/OS
exhibitor_storage_backend: static
master_discovery: static
master_list:
- 192.168.1.21
oauth_enabled: false
process_timeout: 10000
public_agent_list:
- 192.168.1.24
resolvers:
- 192.168.1.1
ssh_port: 22
ssh_user: root
telemetry_enabled: false

Or (Option #2) manually compose a yaml file, like the example above, and invoke the CLI installer (recommended because issues are easier to troubleshoot)

Compose /root/genconf/config.yaml. See example above

Compose /root/genconf/rexray.yaml

example content of /root/genconf/rexray.yaml

The DC/OS installer will install a supported version of REX-Ray and "push" this configuration file to all cluster nodes. Substitute the actual ip of your ScaleIO gateway, your ScaleIO systemID and name, and your ScaleIO username and password.

rexray:
  loglevel: info
  modules:
    default-admin:
      host: tcp://127.0.0.1:61003
    default-docker:
      disabled: false
  storageDrivers:
  - scaleio
scaleio:
  endpoint: https://192.168.1.14/api
  insecure: true
  userName: admin
  password: Scaleio123!
  systemID: 5ecccbed13f5b
  systemName: tenantName
  protectionDomainName: default
  storagePoolName: default

Invoke multi-step DC/OS install:

bash dcos_generate_config.sh --genconf
bash dcos_generate_config.sh --install-prereqs
bash dcos_generate_config.sh -v --preflight
bash dcos_generate_config.sh --deploy
bash dcos_generate_config.sh --postflight

If a failure occurs diring any step, you must do an uninstall and start over from the beginning

Uninstall: /opt/mesosphere/bin/pkgpanda uninstall && rm -fr /opt/mesosphere

On Cluster Nodes (agents that run containerized tasks)

Install the ScaleIO Client binary package

yum install -y numactl libaio
yum localinstall -y EMC-ScaleIO-sdc-2.0-5014.0.el7.x86_64.rpm
/opt/emc/scaleio/sdc/bin/drv_cfg --add_mdm --ip 192.168.1.11,192.168.1.12 --file /bin/emc/scaleio/drv_cfg.txt

On ScaleIO MDM (first one for example), activate the client node

scli --add_sdc --sdc_ip 192.168.1.x

Returning to Client

Verify that REX-Ray configuration has been installed: cat /etc/rexray/config.yml:

rexray:
  loglevel: info
  modules:
    default-admin:
      host: tcp://127.0.0.1:61003
    default-docker:
      disabled: false
  storageDrivers:
  - scaleio
scaleio:
  endpoint: https://192.168.1.14/api
  insecure: true
  userName: admin
  password: Scaleio123!
  systemID: 5ecccbed13f5b
  systemName: tenantName
  protectionDomainName: default
  storagePoolName: default

Test operation of REX-Ray with ScaleIO

/opt/emc/scaleio/sdc/bin/drv_cfg --rescan
/opt/mesosphere/bin/rexray version
/opt/mesosphere/bin/rexray env
/opt/mesosphere/bin/rexray volume ls

Open DC/OS UI

Substitute the ip of your DC/OS Master node and open this link in a browser:

http://192.168.1.21/#/dashboard
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment