Skip to content

Instantly share code, notes, and snippets.

@odyssey4me
Last active November 2, 2019 03:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save odyssey4me/d1a202d6e340d165513f9cec1d19d5f0 to your computer and use it in GitHub Desktop.
Save odyssey4me/d1a202d6e340d165513f9cec1d19d5f0 to your computer and use it in GitHub Desktop.
Setup a basic nodepool host using a single VM and localhost
# prep host
apt update
apt purge nano
apt install git vim tmux fail2ban
# prep ssh key
key_path="${HOME}/.ssh"
key_file="${key_path}/id_rsa"
mkdir -p ${key_path}
chmod 700 ${key_path}
ssh-keygen -t rsa -f ${key_file} -N ''
key_content=$(cat "${key_file}.pub")
echo "${key_content}" | tee -a ${key_path}/authorized_keys
# prep ansible 2.3.x
apt install python-minimal python-pip python-virtualenv python-apt gcc libffi-dev libssl-dev
virtualenv ansible-2.3
source ansible-2.3/bin/activate
pip install 'ansible>2.3.0,<2.4.0' shade
# prep the ansible config
WORKSPACE=${WORKSPACE:-$(pwd)}
mkdir ${WORKSPACE}/roles
mkdir ${WORKSPACE}/inventory
echo '[defaults]' > ${WORKSPACE}/ansible.cfg
echo "roles_path = ${WORKSPACE}/roles" >> ${WORKSPACE}/ansible.cfg
echo "inventory = ${WORKSPACE}/inventory" >> ${WORKSPACE}/ansible.cfg
curl https://raw.githubusercontent.com/ansible/ansible/v2.3.2.0-1/contrib/inventory/openstack.py > inventory/openstack.py
chmod 755 inventory/openstack.py
# make sure that an appropriately setup clouds.yaml file is implemented
# in ~/.config/openstack/clouds.yaml
# example: https://raw.githubusercontent.com/ansible/ansible/v2.3.2.0-1/contrib/inventory/openstack.yml
# note the extra ansible options at the bottom
# prep roles
# While there is a zookeeper role from the openstack git org,
# it does not support clustering, whereas this role does.
git clone https://github.com/AnsibleShipyard/ansible-zookeeper ${WORKSPACE}/roles/ansible-zookeeper
pushd ${WORKSPACE}/roles/ansible-zookeeper
# HEAD of 'master' as of 15 Nov 2017
git checkout 769c5e974a50291e8556b789e57f809d87063262
popd
git clone https://github.com/openstack/ansible-role-nodepool ${WORKSPACE}/roles/ansible-role-nodepool
pushd ${WORKSPACE}/roles/ansible-role-nodepool
# HEAD of 'master' as of 7 Nov 2017
git checkout 6757bae61fd541595ff4e1eb63aef557ea586707
popd
git clone https://github.com/openstack/ansible-role-diskimage-builder ${WORKSPACE}/roles/ansible-role-diskimage-builder
pushd ${WORKSPACE}/roles/ansible-role-diskimage-builder
# HEAD of 'master' as of 7 Nov 2017
git checkout 63c794fddb026b405d7f0730011ae9887b48c698
popd
# execute playbooks
ansible-playbook setup-nodepool.yml
#!/usr/bin/env python
# This is a hack.
# Instead of doing this, we should actually make use of the
# request/release process outlined in the spec:
# http://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html#id1
# http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/model.py?h=feature/zuulv3#n526
# http://git.openstack.org/cgit/openstack-infra/nodepool/tree/nodepool/zk.py?h=feature/zuulv3#n348
import argparse
import time
import logging
import json
from kazoo.client import KazooClient
if __name__ == "__main__":
logging.basicConfig()
parser = argparse.ArgumentParser(
description="Changes the state of a nodepool node"
)
parser.add_argument(
"--node_id", help="Node ID",
required=True
)
parser.add_argument(
"--state", help="State to set (eg: testing)",
required=True
)
args = parser.parse_args()
# The hosts should be configurable - probably best through
# a configuration file.
nodepool = KazooClient(hosts='127.0.0.1:2181')
nodepool.start()
node_path = "/nodepool/nodes/" + args.node_id
if nodepool.exists(node_path):
# get the data
data, stat = nodepool.get(node_path)
node_data = json.loads(data.decode("utf-8"))
# modify the state
node_data['state'] = args.state
# upload the modified data
nodepool.set(node_path, json.dumps(node_data).encode("utf-8"))
else:
print("Node ID %s does not exist.", id)
# Check the following location for updated settings:
# https://docs.openstack.org/infra/nodepool/feature/zuulv3/configuration.html
# And here for older settings which may be useful to learn from:
# http://git.openstack.org/cgit/openstack-infra/project-config/tree/nodepool/nodepool.yaml
# The project-config settings are not usable as-is for nodepool from zuul v3.
images-dir: /opt/nodepool/images
zookeeper-servers:
- host: localhost
port: 2181
labels:
- name: ubuntu-xenial
min-ready: 1
# TODO(odyssey4me):
# I do not think we'll want to use this for production.
# This is purely here for test purposes.
max-ready-age: 300
providers:
- name: rax-iad
region-name: 'IAD'
cloud: rax
api-timeout: 60
boot-timeout: 120
hostname-format: nodepool-{label.name}-{provider.name}-{node.id}
image-name-format: nodepool-{image_name}-{timestamp}
diskimages:
- name: ubuntu-xenial
config-drive: true
pools:
- name: main
max-servers: 2
availability-zones: []
labels:
- name: ubuntu-xenial
diskimage: ubuntu-xenial
flavor-name: general1-8
key-name: jesse_pretorius
console-log: true
- name: rax-dfw
region-name: 'DFW'
cloud: rax
api-timeout: 60
boot-timeout: 120
hostname-format: nodepool-{label.name}-{provider.name}-{node.id}
image-name-format: nodepool-{image_name}-{timestamp}
diskimages:
- name: ubuntu-xenial
config-drive: true
pools:
- name: main
max-servers: 2
availability-zones: []
labels:
- name: ubuntu-xenial
diskimage: ubuntu-xenial
flavor-name: general1-8
key-name: jesse_pretorius
console-log: true
- name: rax-ord
region-name: 'ORD'
cloud: rax
api-timeout: 60
boot-timeout: 120
hostname-format: nodepool-{label.name}-{provider.name}-{node.id}
image-name-format: nodepool-{image_name}-{timestamp}
diskimages:
- name: ubuntu-xenial
config-drive: true
pools:
- name: main
max-servers: 2
availability-zones: []
labels:
- name: ubuntu-xenial
diskimage: ubuntu-xenial
flavor-name: general1-8
key-name: jesse_pretorius
console-log: true
diskimages:
- name: ubuntu-xenial
# We'll want to use something like 86400 (24 hrs)
# for production. The idea here is to keep the
# image fresh. This ensures that any merged config
# changes to the diskimage-build scripts take
# effect within a reasonable time.
# TODO(odyssey4me):
# Adjust this for production. This is 1hr.
rebuild-age: 3600
formats:
- vhd
elements:
- ubuntu-minimal
- growroot
- openssh-server
- simple-init
- vm
release: xenial
env-vars:
TMPDIR: /opt/nodepool/dib_tmp
DIB_CHECKSUM: '1'
DIB_IMAGE_CACHE: /opt/nodepool/dib_cache
DIB_GRUB_TIMEOUT: '0'
DIB_DISTRIBUTION_MIRROR: 'http://mirror.rackspace.com/ubuntu'
DIB_DEBIAN_COMPONENTS: 'main,universe'
#!/usr/bin/env python
# This is a hack.
# Instead of doing this, we should actually make use of the
# request/release process outlined in the spec:
# http://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html#id1
# http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/model.py?h=feature/zuulv3#n526
# http://git.openstack.org/cgit/openstack-infra/nodepool/tree/nodepool/zk.py?h=feature/zuulv3#n348
import argparse
import daemon
import errno
import extras
import json
import logging
import logging.config
import os
import signal
import sys
import time
import yaml
from kazoo.client import KazooClient
# as of python-daemon 1.6 it doesn't bundle pidlockfile anymore
# instead it depends on lockfile-0.9.1 which uses pidfile.
pid_file_module = extras.try_imports(['daemon.pidlockfile', 'daemon.pidfile'])
log = logging.getLogger('nodepool-watcher')
def is_pidfile_stale(pidfile):
""" Determine whether a PID file is stale.
Return 'True' ("stale") if the contents of the PID file are
valid but do not match the PID of a currently-running process;
otherwise return 'False'.
"""
result = False
pidfile_pid = pidfile.read_pid()
if pidfile_pid is not None:
try:
os.kill(pidfile_pid, 0)
except OSError as exc:
if exc.errno == errno.ESRCH:
# The specified PID does not exist
result = True
return result
class WatcherDaemon(object):
app_name = 'nodepool-watcher'
app_description = 'Watch for new nodepool nodes.'
def __init__(self):
self.args = None
def parse_arguments(self):
parser = argparse.ArgumentParser(description=self.app_description)
parser.add_argument('-logfile',
dest='logfile',
help='path to log file',
default='/var/log/%s.log' % self.app_name)
parser.add_argument('-loglevel',
dest='loglevel',
help='level of logging',
default='WARNING')
parser.add_argument('-pidfile',
dest='pidfile',
help='path to pid file',
default='/var/run/%s.pid' % self.app_name)
parser.add_argument('-nodaemon',
dest='nodaemon',
action='store_true',
help='do not run as a daemon')
self.args = parser.parse_args()
def setup_logging(self):
logging.basicConfig(filename=self.args.logfile,
level=self.args.loglevel,
format='%(asctime)s %(levelname)s %(name)s: '
'%(message)s')
def exit_handler(self, signum, frame):
log.warning("Stopping service.")
self.zk.stop()
os._exit(0)
def watch_nodes(self, children):
self._children = sorted(children)
log.info("Children are now: %s" % self._children)
for child in children:
child_path = "/nodepool/nodes/" + child
self.zk.DataWatch(child_path, self.watch_node_data)
def watch_node_data(self, data, stat, event=None):
if event == None or event.type != "DELETED":
node_data = json.loads(data.decode("utf-8"))
if event != None:
log.info("Node %s is now %s." % (event.path,
node_data['state']))
log.debug(json.dumps(node_data, indent=2, sort_keys=True))
if node_data['state'] == "ready":
self.register_node(node_data['hostname'],
node_data['region'],
node_data['interface_ip'])
else:
log.info("Node %s has been %s." % (event.path, event.type))
def register_node(self, hostname, region, interface_ip):
# This needs some sort of tracking to prevent race conditions
# because once the node is in a ready state the node data
# changes one last time, causing a repeat registration event.
# It might be best to actually use our own table (or the nodepool
# node comment field) in zookeeper to track nodes which have been
# registered already. This may also be useful to allow retries in
# case a registration fails.
log.info("Registering %s in %s on %s" % (hostname,
region,
interface_ip))
# Here needs to be some sort of registration process for jenkins.
# It might make sense to have a number of workers doing this so
# that many nodes can be fired up at once and they can all be
# registered against jenkins at the same time.
def main(self):
self.setup_logging()
signal.signal(signal.SIGINT, self.exit_handler)
signal.signal(signal.SIGTERM, self.exit_handler)
log.warning("Starting service.")
# The hosts should be configurable - probably best through
# a configuration file.
self.zk = KazooClient(hosts='127.0.0.1:2181', read_only=True)
self.zk.start()
self.zk.ChildrenWatch("/nodepool/nodes", func=self.watch_nodes)
while True:
signal.pause()
def main():
app = WatcherDaemon()
app.parse_arguments()
pid = pid_file_module.TimeoutPIDLockFile(app.args.pidfile, 10)
if is_pidfile_stale(pid):
pid.break_lock()
if app.args.nodaemon:
app.main()
else:
with daemon.DaemonContext(pidfile=pid):
app.main()
if __name__ == "__main__":
sys.exit(main())
---
- hosts: localhost
connection: local
gather_facts: False
vars:
keyname: jesse_pretorius
cloud_name: rax
regions:
- DFW
- IAD
- ORD
flavor: "general1-8"
image: "Ubuntu 16.04 LTS (Xenial Xerus) (PVHVM)"
instance_name: "nodepool-server"
instance_metadata:
build_config: core
group: "nodepool-server"
user_data_path: "user_data_pubcloud.sh"
tasks:
- name: Provision cloud instances
os_server:
# TODO(odyssey4me):
# switch to using lower case for regions
# so that the server name matches the DNS
# name when the server comes up
name: "{{ instance_name }}-{{ item }}"
flavor: "{{ flavor }}"
state: present
cloud: "{{ cloud_name }}"
region_name: "{{ item }}"
image: "{{ image }}"
key_name: "{{ keyname }}"
userdata: "{{ lookup('file', user_data_path) }}"
config_drive: yes
meta: "{{ instance_metadata }}"
wait: yes
timeout: 900
with_items: "{{ regions }}"
register: _create_instance
async: 900
poll: 0
- name: Create data volumes
os_volume:
display_name: "{{ instance_name }}-{{ item }}-images"
volume_type: "SATA"
size: 512
state: present
cloud: "{{ cloud_name }}"
region_name: "{{ item }}"
wait: yes
timeout: 900
with_items: "{{ regions }}"
register: _create_volume
async: 900
poll: 0
- name: Wait for cloud instances to be created
async_status:
jid: "{{ item['ansible_job_id'] }}"
register: _create_instance_jobs
until: _create_instance_jobs['finished'] | bool
delay: 5
retries: 180
with_items: "{{ _create_instance['results'] }}"
when:
- item['ansible_job_id'] is defined
- name: Wait for data volumes to be created
async_status:
jid: "{{ item['ansible_job_id'] }}"
register: _create_volume_jobs
until: _create_volume_jobs['finished'] | bool
delay: 5
retries: 180
with_items: "{{ _create_volume['results'] }}"
when:
- item['ansible_job_id'] is defined
- name: Attach the data volumes
os_server_volume:
server: "{{ instance_name }}-{{ item }}"
volume: "{{ instance_name }}-{{ item }}-images"
state: present
cloud: "{{ cloud_name }}"
region_name: "{{ item }}"
wait: yes
timeout: 900
with_items: "{{ regions }}"
register: _attach_volume
- name: Refresh dynamic inventory for any changes made
meta: refresh_inventory
- hosts: "nodepool-server"
gather_facts: no
user: root
vars:
nodepool_git_dest: "{{ ansible_user_dir }}/src/git.openstack.org/openstack-infra/nodepool"
nodepool_git_version: d20a13da9dba90e357cd91a9aa58fd8c6b5f2e2d # HEAD of 'feature/zuulv3' as of 7 Nov 2017
diskimage_builder_git_dest: "{{ ansible_user_dir }}/src/git.openstack.org/openstack/diskimage-builder"
diskimage_builder_git_version: bc6c928bb960729e8df60562adafe2d50a1f55ec # HEAD of 'master' as of 7 Nov 2017
volume_device: "xvdb"
zookeeper_debian_apt_install: yes
zookeeper_hosts: "{{ groups['nodepool-server'] }}"
tasks:
- name: Wait for a successful connection
wait_for_connection:
connect_timeout: 2
delay: 5
sleep: 5
timeout: 60
- name: Gather facts
setup:
gather_subset: "!facter,!ohai"
- name: Install prerequisite packages
apt:
name:
- gcc
- git
- fail2ban
- libffi-dev
- libssl-dev
- lvm2
- openjdk-8-jre-headless
- python-apt
- python-minimal
- python-pip
- python-virtualenv
- ufw
update_cache: yes
- name: Prepare the volume partition
parted:
device: "/dev/{{ volume_device }}"
number: 1
state: present
flags: [ lvm ]
label: gpt
when: ansible_devices[volume_device]['partitions'] == {}
- name: Prepare LVM volume group
lvg:
vg: images-vg
pvs: "/dev/{{ volume_device }}1"
state: present
- name: Prepare LVM logical volume
lvol:
vg: images-vg
lv: images-lv
state: present
size: 100%PVS
- name: Prepare filesystem on LVM logical volume
filesystem:
fstype: ext4
dev: "/dev/images-vg/images-lv"
- name: Mount filesystem
mount:
path: "/opt"
src: "/dev/images-vg/images-lv"
fstype: ext4
state: mounted
# install extra packages for diskimage-builder
# The native xenial package which provides vhd-utils (blktap-utils) does not support
# the 'convert' command which is required in order to properly prepare VHD images for
# the Xen hosts used by Rackspace Public Cloud. We therefore make use of the same PPA
# used by openstack-infra which has the modified version available.
# https://launchpad.net/~openstack-ci-core/+archive/ubuntu/vhd-util
# built from: https://github.com/emonty/vhd-util
# deployed by: https://github.com/openstack-infra/puppet-diskimage_builder/blob/339340409823927bb987f0195c6cedfdace05f4a/manifests/init.pp#L26
- name: Add vhd-util PPA
apt_repository:
filename: "vhd-util"
repo: "ppa:openstack-ci-core/vhd-util"
update_cache: yes
- name: Install vhd-util
apt:
name: "vhd-util"
# the default /etc/hosts file results in the name of the instance
# resolving to its own private address first, causing zookeeper
# to listen on the wrong address, and thus clustering to fail
- name: Prepare /etc/hosts for the zookeeper group
copy:
content: |
127.0.0.1 localhost
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
# zookeeper hosts
{% for host in zookeeper_hosts %}
{{ hostvars[host].ansible_default_ipv4.address }} {{ host }} {{ host | lower }}
{{ hostvars[host].ansible_default_ipv6.address }} {{ host }} {{ host | lower }}
{% endfor %}
dest: "/etc/hosts"
- name: Configure firewall to allow ssh
ufw:
rule: allow
name: OpenSSH
- name: Configure firewall to allow cluster traffic (ipv4)
ufw:
rule: allow
from_ip: "{{ hostvars[item].ansible_default_ipv4.address }}"
with_items: "{{ zookeeper_hosts }}"
- name: Configure firewall to allow cluster traffic (ipv4)
ufw:
rule: allow
from_ip: "{{ hostvars[item].ansible_default_ipv6.address }}"
with_items: "{{ zookeeper_hosts }}"
- name: Enable firewall
ufw:
state: enabled
policy: deny
direction: incoming
- include_role:
name: "ansible-zookeeper"
- include_role:
name: "ansible-role-diskimage-builder"
- include_role:
name: "ansible-role-nodepool"
- name: Create openstack config directory
file:
path: "/var/lib/nodepool/.config/openstack"
owner: "nodepool"
group: "nodepool"
mode: "0700"
state: directory
- name: Copy clouds.yaml
copy:
src: "~/.config/openstack/clouds.yaml"
dest: "/var/lib/nodepool/.config/openstack/clouds.yaml"
owner: "nodepool"
group: "nodepool"
mode: "0600"
- name: Create ssh config directory
file:
path: "/home/nodepool/.ssh"
owner: "nodepool"
group: "nodepool"
mode: "0700"
state: directory
- name: Copy private key
copy:
src: "~/.ssh/id_rsa"
dest: "/home/nodepool/.ssh/id_rsa"
owner: "nodepool"
group: "nodepool"
mode: "0600"
- name: Allow passwordless sudo for nodepool
lineinfile:
dest: /etc/sudoers.d/nodepool
create: yes
state: present
regexp: '^%nodepool'
line: '%nodepool ALL=NOPASSWD: ALL'
validate: visudo -cf %s
- name: Create diskimage-builder tmp directory
file:
path: "/opt/nodepool/dib_tmp"
owner: "nodepool"
group: "nodepool"
state: directory
- name: Copy configuration for nodepool
copy:
src: "nodepool-config.yml"
dest: "/etc/nodepool/nodepool.yaml"
@mattt416
Copy link

Any downside to not creating partitions on the volumes before adding them to LVM? We don't appear to do this with our current repo servers.

@odyssey4me
Copy link
Author

@matt416 no idea, as far as I know that's just what's supposed to be done - if it works in another simpler way then do it that way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment