DenisIzmaylov/NOTES.md

## NOTES.md

      
    Raw
  

              NOTES.md
            
          
    Step By Step Guide to Configure a CoreOS Cluster From Scratch

This guide describes how to bootstrap new Production Core OS Cluster as High Availability Service in a 15 minutes with using etcd2, Fleet, Flannel, Confd, Nginx Balancer and Docker.
Content


Introduction

Tools Used


Basic Configuration

Connect Your Servers as a Cluster
Create Fleet Units
Configure Firewall Rules
Load Balancers and Service Discovery


Troubleshooting
Update CoreOS
Usage with Deis v1
Appendix 1 - Info and Tutorials
Appendix 2 - Tools and Services

Introduction

CoreOS is a powerful Linux distribution built to make large, scalable deployments on varied infrastructure simple to manage.
CoreOS is designed for security, consistency, and reliability. Instead of installing packages via yum or apt, CoreOS uses Linux containers to manage your services at a higher level of abstraction. A single service's code and all dependencies are packaged within a container that can be run on one or many CoreOS machines.
Main building blocks of CoreOS — etcd, Docker and systemd.
See: 7 reasons why you should be using CoreOS with Docker.
Tools Used


etcd: key-value store for service registration and discovery
fleet: scheduling and failover of Docker containers across CoreOS Cluster
flannel: gives each docker container a unique IP that allows you to access the internal port (i.e. port 80 not 32679)
confd: watch etcd for nodes arriving/leaving and update (with reload) nginx configuration by using specified template

Basic Configuration

Connect your servers as a cluster


Find your Cloud Config file location. For examples below we will use:

/var/lib/coreos-install/user_data

Open your config to edit:

sudo vi /var/lib/coreos-install/user_data

Generate new token for your cluster: https://discovery.etcd.io/new?size=X, where X is servers count.
Merge follow lines with your Cloud Config:

coreos:
  etcd2:
    # Generate a new token for each unique cluster from https://discovery.etcd.io/new
    # discovery: https://discovery.etcd.io/<token>
    discovery: https://discovery.etcd.io/9c19239271bcd6be78d4e8acfb393551
    
    # Multi-region and multi-cloud deployments need to use $public_ipv4
    advertise-client-urls: http://$private_ipv4:2379,http://$private_ipv4:4001
    initial-advertise-peer-urls: http://$private_ipv4:2380
    
    # Listen on both the official ports and the legacy ports
    # Legacy ports can be omitted if your application doesn't depend on them
    listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
    listen-peer-urls: http://$private_ipv4:2380
  
  fleet:
    public-ip: $private_ipv4
    metadata: region=europe,public_ip=$public_ipv4
  
  flannel:
    interface: $private_ipv4
  
  units:
    - name: etcd2.service
      command: start
      # See issue: https://github.com/coreos/etcd/issues/3600#issuecomment-165266437
      drop-ins:
        - name: "timeout.conf"
          content: |
            [Service]
            TimeoutStartSec=0
            
    - name: fleet.service
      command: start
      
    # Network configuration should be here, e.g:
    # - name: 00-eno1.network
    #   content: "[Match]\nName=eno1\n\n[Network]\nDHCP=yes\n\n[DHCP]\nUseMTU=9000\n"
    # - name: 00-eno2.network
    #   runtime: true
    #   content: "[Match]\nName=eno2\n\n[Network]\nDHCP=yes\n\n[DHCP]\nUseMTU=9000\n"
    
    - name: flanneld.service
      command: start
      drop-ins:
      - name: 50-network-config.conf
        content: |
          [Service]
          ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{ "Network": "10.1.0.0/16" }'
          
    - name: docker.service
      command: start
      drop-ins:
      - name: 60-docker-wait-for-flannel-config.conf
        content: |
          [Unit]
          After=flanneld.service
          Requires=flanneld.service

          [Service]
          Restart=always
          
    - name: docker-tcp.socket
      command: start
      enable: true
      content: |
        [Unit]
        Description=Docker Socket for the API

        [Socket]
        ListenStream=2375
        Service=docker.service
        BindIPv6Only=both

        [Install]
        WantedBy=sockets.target

Online.net provide has a specific configuration preset. It requires you to process additional step - add those lines to Cloud Config to get Private Network working:

units:
  # ...
  - name: 00-eno2.network
    runtime: true
    content: "[Match]\nName=eno2\n\n[Network]\nDHCP=yes\n\n[DHCP]\nUseMTU=9000\n"

Validate your changes:

sudo coreos-cloudinit -validate --from-file /var/lib/coreos-install/user_data

Reboot the system:

sudo reboot

Check status for etcd2:

sudo systemctl status -r etcd2
Output should contain a follow line:
 Active: active (running)
Sometimes it takes a time. Don't panic. Just wait for a few minutes.


Repeat those steps for each server in your cluster.


Check your cluster health and fleet status:


# should be healthy
sudo etcdctl cluster-health
# should display all servers
sudo fleetctl list-machines
Create Fleet Units


See: Launching Containers with fleet
Application Unit


Enter to your home directory:

cd ~

Create new Application Template Unit. For example - run vi test-app@.service and add follow lines:

[Unit]
Description=test-app%i
After=docker.service

[Service]
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker kill test-app%i
ExecStartPre=-/usr/bin/docker rm test-app%i
ExecStartPre=/usr/bin/docker pull willrstern/node-sample
ExecStart=/usr/bin/docker run -e APPNAME=test-app%i --name test-app%i -P willrstern/node-sample
ExecStop=/usr/bin/docker stop test-app%i

Submit Application Template Unit to Fleet:

fleetctl submit test-app@.service

Start new instances from Application Template Unit:

fleetctl start test-app@1
fleetctl start test-app@2
fleetctl start test-app@3

Check that all instances has been started and active. It could take a few minutes. Example command and its output:

$ fleetctl list-units
UNIT			MACHINE				ACTIVE	SUB
test-app@1.service	e1512f34.../10.1.9.17	active	running
test-app@2.service	a78a3229.../10.1.9.18	active	running
test-app@3.service	081c8a1e.../10.1.9.19	active	running
Configure Firewall Rules

Run custom-firewall.sh from Deis v1 on your local machine:
curl -O https://raw.githubusercontent.com/deis/deis/master/contrib/util/custom-firewall.sh
# run follow line for each server
ssh core@<host1> 'bash -s' < custom-firewall.sh
Load Balancers and Service Discovery


Download someapp@.service, someapp-discovery@.service and someapp-lb@.service.
Modify those Unit Templates according to your application config.
Submit modificated files to your Fleet:

fleetctl submit someapp@.service
fleetctl submit someapp-discovery@.service
fleetctl submit someapp-lb@.service

Start Unit instances from templates:

fleetctl start someapp@{1..6}
fleetctl start someapp-discovery@{1..6}
fleetctl start someapp-lb@{1..2}

Verify all is working good:

fleetctl list-units
Troubleshooting


Something goes wrong and a service doesn't work
Use those commands to debug:
# also for fleet, etcd, flanneld
sudo systemctl start etcd2
sudo systemctl status etcd2
sudo journalctl -xe
sudo journalctl -xep3
sudo journalctl -ru etcd2


fleet list-units is displaying failed state for any units
For local units:
sudo fleetctl journal someapp@1
For remote units:
fleetctl journal someapp@1


fleetctl reponds with: Error running remote command: SSH_AUTH_SOCK environment variable is not set. Verify ssh-agent is running. See https://github.com/coreos/fleet/blob/master/Documentation/using-the-client.md for help.

Check you have connected with ssh -A.
Check you are not using sudo for remote machines. In this case a process under sudo can't access to your SSH_AUTH_SOCK.


Error response from daemon: Conflict. The name "someapp1" is already in use by container c4acbb70c654. You have to delete (or rename) that container to be able to reuse that name.
fleetctl stop someapp@1
docker rm someapp1
fleetctl start someapp@1


fleet ssh command doesn't working

Ensure your public key has been added everywhere in user_data. On each server.
Connect to your server with SSH agent:

eval `ssh-agent -s`
ssh-add ~/.ssh/id_rsa
ssh -A <your-host>


Update CoreOS

sudo update_engine_client -update
sudo reboot

See more details here.
Install Deis v1

Attention! It seems that doesn't work correctly with Online.net and other bare metal setups because ceph which is using for v1 works unstable and unpredictable. But if you would like to make an experiment, let's go:

Create backup copy of your original config:

sudo vi cp /var/lib/coreos-install/user_data /var/lib/coreos-install/user_data.without-deis1


Merge your Cloud Config with Deis Cloud Config example.


You can configure Deis Platform from your workstation by following this instruction. The next steps adopted for server environment.


Download deisctl:


curl -sSL http://deis.io/deisctl/install.sh | sudo sh -s 1.12.3

Set your configuration:

deisctl config platform set domain=<your-domain>

Run platform installation:

deisctl install platform

Boot up Deis:

deisctl start platform
If you get problems try to check Docker containers:
docker ps -a
Also you could use journal and status commands for deisctl to debug.

Once you see “Deis started.”, your Deis platform is running on a cluster. Verify that all Deis units are loaded by run:

deisctl list
All Deis units should be active. Otherwise you could destroy that all and don't forget to remove unused Docker volumes.
Appendix 1 - Info and Tutorials


Building Microservices with CoreOS & etcd ^{[video talk]}
How To Set Up a CoreOS Cluster on DigitalOcean
How To Secure Your CoreOS Cluster with TLS/SSL and Firewall Rules on DigitalOcean
^[article]
Automated Nginx Reverse Proxy for Docker^[article]
High Availability Apps via Fleet & CoreOS – Start to Finish: Provisioning on Azure^[article]
Tips for Deploying NGINX (Official Image) with Docker^[article]
CoreOS Continued: Fleet and Docker^[article]
Nginx Load Balancer Service For Core OS
ServerFault: Nginx proxy to many container running on different CoreOS nodes
Gist: Running a High Availability Service on CoreOS using Docker, Fleet, Flannel, Etcd, Confd & Nginx
Docker Fleet Starter^{[github repo]}
Do Not Use Public Discovery Service For Runtime Reconfiguration^[tips]

Appendix 2 - Tools and Services


docker-nginx-https-redirect
Monitor CoreOS at scale with DataDog
CoreGI - WebUI for monitoring CoreOS clusters including fleet and etcd


## someapp-discovery@.service
[Unit]
Description=Announce Someapp%i

# Requirements
BindsTo=someapp@%i.service
Requires=etcd2.service
Requires=docker.service

# Dependency ordering
After=someapp@%i.service
After=etcd2.service
After=docker.service

[Service]
ExecStart=/bin/sh -c "while true; do etcdctl set /services/someapp/upstream/someapp%i \"$(sleep 5 && docker inspect -f '{{.NetworkSettings.IPAddress}}' someapp%i):3000\" --ttl 60;sleep 45;done"
ExecStop=/usr/bin/etcdctl rm /services/someapp/upstream/someapp%i

[X-Fleet]
MachineOf=someapp@%i.service

## someapp-lb@.service
[Unit]
Description=someapp-lb%i

# Requirements
Requires=docker.service
Requires=etcd2.service

# Dependency ordering
After=docker.service
After=etcd2.service

[Service]
# Let the process take awhile to start up (for first run Docker containers)
TimeoutStartSec=0

# Change killmode from "control-group" to "none" to let Docker remove
# work correctly.
KillMode=none

# Get CoreOS environmental variables
EnvironmentFile=/etc/environment

# Directives with "=-" are allowed to fail without consequence
ExecStartPre=-/usr/bin/docker kill someapp-lb%i
ExecStartPre=-/usr/bin/docker rm someapp-lb%i
ExecStartPre=/usr/bin/docker pull denisizmaylov/nginx-lb
ExecStart=/usr/bin/sh -c "/usr/bin/docker run --name someapp-lb%i --rm -p 80:80 -e SERVICE_NAME=someapp -e ETCD=\"$(ifconfig docker0 | awk '/\\<inet\\>/ { print $2 }'):2379\" denisizmaylov/nginx-lb"
ExecStop=/usr/bin/docker stop someapp-lb%i

[X-Fleet]
Conflicts=someapp-lb@*
MachineMetadata=loadbalancer=true

## someapp@.service
[Unit]
Description=someapp%i
Requires=docker.service
After=docker.service

[Service]
# Let the process take awhile to start up (for first run Docker containers)
TimeoutStartSec=0

# Directives with "=-" are allowed to fail without consequence
ExecStartPre=-/usr/bin/docker kill someapp%i
ExecStartPre=-/usr/bin/docker rm someapp%i
ExecStartPre=/usr/bin/docker pull denisizmaylov/node-sample
ExecStart=/usr/bin/docker run -e APPNAME=someapp%i --name someapp%i -P denisizmaylov/node-sample
ExecStop=/usr/bin/docker stop someapp%i
	[Unit]
	Description=Announce Someapp%i

	# Requirements
	BindsTo=someapp@%i.service
	Requires=etcd2.service
	Requires=docker.service

	# Dependency ordering
	After=someapp@%i.service
	After=etcd2.service
	After=docker.service

	[Service]
	ExecStart=/bin/sh -c "while true; do etcdctl set /services/someapp/upstream/someapp%i \"$(sleep 5 && docker inspect -f '{{.NetworkSettings.IPAddress}}' someapp%i):3000\" --ttl 60;sleep 45;done"
	ExecStop=/usr/bin/etcdctl rm /services/someapp/upstream/someapp%i

	[X-Fleet]
	MachineOf=someapp@%i.service
	[Unit]
	Description=someapp-lb%i

	# Requirements
	Requires=docker.service
	Requires=etcd2.service

	# Dependency ordering
	After=docker.service
	After=etcd2.service

	[Service]
	# Let the process take awhile to start up (for first run Docker containers)
	TimeoutStartSec=0

	# Change killmode from "control-group" to "none" to let Docker remove
	# work correctly.
	KillMode=none

	# Get CoreOS environmental variables
	EnvironmentFile=/etc/environment

	# Directives with "=-" are allowed to fail without consequence
	ExecStartPre=-/usr/bin/docker kill someapp-lb%i
	ExecStartPre=-/usr/bin/docker rm someapp-lb%i
	ExecStartPre=/usr/bin/docker pull denisizmaylov/nginx-lb
	ExecStart=/usr/bin/sh -c "/usr/bin/docker run --name someapp-lb%i --rm -p 80:80 -e SERVICE_NAME=someapp -e ETCD=\"$(ifconfig docker0 \| awk '/\\<inet\\>/ { print $2 }'):2379\" denisizmaylov/nginx-lb"
	ExecStop=/usr/bin/docker stop someapp-lb%i

	[X-Fleet]
	Conflicts=someapp-lb@*
	MachineMetadata=loadbalancer=true
	[Unit]
	Description=someapp%i
	Requires=docker.service
	After=docker.service

	[Service]
	# Let the process take awhile to start up (for first run Docker containers)
	TimeoutStartSec=0

	# Directives with "=-" are allowed to fail without consequence
	ExecStartPre=-/usr/bin/docker kill someapp%i
	ExecStartPre=-/usr/bin/docker rm someapp%i
	ExecStartPre=/usr/bin/docker pull denisizmaylov/node-sample
	ExecStart=/usr/bin/docker run -e APPNAME=someapp%i --name someapp%i -P denisizmaylov/node-sample
	ExecStop=/usr/bin/docker stop someapp%i