Skip to content

Instantly share code, notes, and snippets.

@michaeltchapman
Last active March 27, 2017 04:21
Show Gist options
  • Save michaeltchapman/e968429f08a54e6e9e02bc934067b90d to your computer and use it in GitHub Desktop.
Save michaeltchapman/e968429f08a54e6e9e02bc934067b90d to your computer and use it in GitHub Desktop.

Apex deployment process and quickstart migration 04-2017

ci/deploy.sh

(Entry point)

  parse_cmdline             # Goes into a new python script to be our entry point
  parse_deploy_settings     # apex-parsing
  parse_network_settings    # apex-parsing
  configure_deps            # apex-deps
  ntpdate $ntp_server       # apex-deps
  setup_undercloud_vm       # handled in quickstart
  if virtual:               # -e 'apex_virtual=True'
    setup_virtual_baremetal # handled in quickstart
  parse_inventory_file      # apex-parsing
  configure_undercloud      
  overcloud_deploy
  if post_config:
    configure_post_install
  if sdn_controller == 'onos':
    onos_update_gw_mac

parse_cmdline

Parses cli options:

-h|--help

Settings files

  • -d|--deploy-settings
  • -i|--inventory
  • -n|--net-settings
  • -e|--environment-file (What is this?)

Can pass all of these into ansible-playbook using -e, but they cannot have any overlapping variable definitions. The quickstart parameter is also -e.

Network checking

  • -p|--ping-site
  • --dnslookup-site

Create a tmp yaml file for these

Deployment configuration

  • -v|--virtual
  • --no-post-config
  • --debug
  • --interactive
  • --virtual-cpus
  • --virtual-ram
  • --virtual-computes

Use the temp yaml file for these too. debug and interactive may not work. In the case of debug we can add ansible debug print statements, but that's a little different from allowing execution of other functions when debug is enabled. I think the majority of our use of debug is print statements anyway.

For interactive I have absolutely no idea on the ansible side.

parse_deploy_settings

Current implementation is in python inheriting from dict. The dict is loaded from yaml, there is some validation to check that there's a dataplane and a ceph option checked, then the dict is written out to bash.

The deploy file could probably just be passed straight in with -e, then we run the validate against the loaded variables.

parse_network_settings

If fdio or dpdk are in use, nodes that use them require preconfig (--compute-pre-config and --controller-pre-config).

Creates a NetworkSettings from the network settings file, then creates a NetworkEnvironment from that plus the environment file. This will run validate_settings, which will do some data massaging:

for each enabled network:
  config_required_settings
  configure_ip_range
  add network to enabled_networks
  validate_overcloud_nic_order
set dns and ntp servers

then checks networks because dpdk requires a tenant network

config_required_settings

sets all networks without vlan set to be vlan: native

for each network:
  if cidr is set:
    network['cidr'] = ip_network(cidr)
  else if 'installer_vm' has 'members' and 'ip'
    // it's a bridged interface

if the provisioner_ip isn't set on the admin network, generate it using the interface specified by network['installer_vm']['members'][0]. Also on admin, set the dhcp_range (2-11) and introspection_range (11-20).

On the external network, set the floating_ip_range (2-20) and the gateway

configure_ip_range

ip range is calculated by ip_utils.py. Sharing python code amongst many python scripts within an ansible module doesn't seem to be supported!?

Perhaps make a python-apex rpm and include ip_utils in that, and install it on the virthost right at the start before we run parsing. then run parsing tasks on the virthost instead of on localhost. Yuck :( Or we can use pip within the quickstart virtualenv on localhost perhaps...

validate_overcloud_nic_order

sets self.nics[role][network] and self.nics_specified[role]. Checks for duplicate definitions (one interface mapped to multiple networks on a single host).

configure_deps

  • check internet connection
  • check dhcpd not running on virthost
  • ensure ip forwarding enabled
  • check libvirt status
  • check ovs status

set virsh_enabled_networks to attach to undercloud.

  • baremetal: only admin, external,
  • virtual: everything

ensure libvirt default network is configured

if baremetal:
  for each network in virsh_enabled_networks:
    add ovs bridge
    define libvirt network
    start network
    set network autostart
    on admin and external:
      attach network['installer_vm']['members'] to ovs bridge
else:
  for each network in OPNFV_NETWORK_TYPES:
    add ovs bridge
    define libvirt network
    start network
    set network autostart
  • ensure default storage pool is set
  • check for kvm modules
  • try to enable nested kvm, if unavailable, add --libvirt-type qemu to DEPLOY_OPTIONS
  • create root ssh key

ntpdate

if ntpdate $ntp_server
  hwclock --systohc

setup_undercloud_vm

  • Start undercloud VM with default admin and external networks attached. define_vm undercloud hd 30 "$undercloud_nets" 4 12288
  • Resize undercloud if needed
  • Inject authorized keys to /root/.ssh and /home/stack/.ssh and set perms
  • Start undercloud
  • Set undercloud to autostart
  • get undercloud_mac
  • set undercloud ip to UNDERCLOUD
  • ssh -T ${SSH_OPTIONS[@]} "root@$UNDERCLOUD" "restorecon -r /home/stack" (fix ssh user?)

setup virtual_baremetal

  • starts baremetal$number VMs with cli arg cpu/memory if given
  • attaches networks to baremetal VMs
  • populates $APEX_TMP_DIR/inventory-virt.yml
  • overwrites /usr/share/tripleo/templates/domain.xml with custom copy

parse_inventory_file

Generates instackenv.json and places in undercloud. Node definitions are moved down to the root of the dict (from being under nodes) and some data is remapped:

node['pm_addr'] = node['ipmi_ip']
node['pm_password'] = node['ipmi_pass']
node['pm_user'] = node['ipmi_user']
node['mac'] = [node['mac_address']]

Bash var root_disk_list is set to whatever is specified as disk_device in the node. I don't think this can have multiple settings at the moment, so when this is consumed by openstack baremetal configure boot later on, it will just use the last value for all nodes. If nothing is set, sda is the default

configure_undercloud

  • copy network_environment.yaml to the undercloud
  • set ext_net_type=br_ex for sdn_l3 (ODL/ONOS L3)
  • set ovs_dpdk_bridge=br-phy if dpdk dataplane in use
  • generate nics/controller.yaml and nics/compute.yaml. Compute needs to know ext_net_type and ovs_dpdk_bridge.
  • copy nic templates to /home/stack/nics
  • generate ssh key for stack user on undercloud
  • if virtual, give stack user on undercloud ssh trust to virthost root (for power control of virtual baremetal)
  • if virtual, inject stack user ssh key into instackenv.json
  • give stack user on undercloud ssh trust to current user on virthost
  • disable requiretty for sudo for stack user

Set some undercloud settings:

openstack-config --set undercloud.conf DEFAULT local_ip ${admin_installer_vm_ip}/${admin_cidr##*/}
openstack-config --set undercloud.conf DEFAULT network_gateway ${admin_installer_vm_ip}
openstack-config --set undercloud.conf DEFAULT network_cidr ${admin_cidr}
openstack-config --set undercloud.conf DEFAULT dhcp_start ${admin_dhcp_range%%,*}
openstack-config --set undercloud.conf DEFAULT dhcp_end ${admin_dhcp_range##*,}
openstack-config --set undercloud.conf DEFAULT inspection_iprange ${admin_introspection_range}
openstack-config --set undercloud.conf DEFAULT undercloud_debug false
openstack-config --set undercloud.conf DEFAULT undercloud_hostname "undercloud.${domain_name}"
sudo openstack-config --set /etc/ironic/ironic.conf disk_utils iscsi_verify_attempts 30
sudo openstack-config --set /etc/ironic/ironic.conf disk_partitioner check_device_max_retries 40

Settings may require uncommenting.

Configure Ceph:

if [[ -n "${deploy_options_array['ceph_device']}" ]]; then
    sed -i '/ExtraConfig/a\\    ceph::profile::params::osds: {\\x27${deploy_options_array['ceph_device']}\\x27: {}}' ${ENV_FILE}
fi

sudo sed -i '/CephClusterFSID:/c\\  CephClusterFSID: \\x27$(cat /proc/sys/kernel/random/uuid)\\x27' /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml
sudo sed -i '/CephMonKey:/c\\  CephMonKey: \\x27'"\$(ceph-authtool --gen-print-key)"'\\x27' /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml
sudo sed -i '/CephAdminKey:/c\\  CephAdminKey: \\x27'"\$(ceph-authtool --gen-print-key)"'\\x27' /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml
  • restart glance-api
  • sudo openstack-config --set /etc/nova/nova.conf DEFAULT dns_domain ${domain_name}
  • sudo openstack-config --set /etc/nova/nova.conf DEFAULT dhcp_domain ${domain_name}
  • restart nova-conductor, nova-api, nova-compute, nova-scheduler
  • sudo openstack-config --set /etc/neutron/neutron.conf DEFAULT dns_domain ${domain_name}
  • restart neutron-server, neutron-dhcp-agent
  • sudo sed -i '/num_engine_workers/c\num_engine_workers = 2' /etc/heat/heat.conf
  • sudo sed -i '/#workers\s=/c\workers = 2' /etc/heat/heat.conf
  • restart heat-engine, heat-api
if there is an external network:
  if external_installer_vm_vlan != native:
    cat <<EOF > /etc/sysconfig/network-scripts/ifcfg-vlan${external_installer_vm_vlan}
DEVICE=vlan${external_installer_vm_vlan}
ONBOOT=yes
DEVICETYPE=ovs
TYPE=OVSIntPort
BOOTPROTO=static
IPADDR=${external_installer_vm_ip}
PREFIX=${external_cidr##*/}
OVS_BRIDGE=br-ctlplane
OVS_OPTIONS="tag=${external_installer_vm_vlan}"
EOF
    ifup vlan${external_installer_vm_vlan}
  else if it's not already up:
    ip a a ${external_installer_vm_ip}/${external_cidr##*/} dev eth2
    ip link set up dev eth2

This may be a slightly odd discrepency where vlan tagged externals will persist across undercloud restarts, but native externals will not.

overcloud_deploy

DEPLOY_OPTIONS is added to piecemeal, depending on which heat templates should be included. templates are under /usr/share/openstack-tripleo-heat-templates/environments/unless otherwise specified. The options are all indeploy_options_array`

if sdn_controller == 'opendaylight':
  if sfc: include opendaylight_sfc.yaml 
  elif vpn: include opendaylight_sdnvpn.yaml 
  elif vpp: include opendaylight_fdio.yaml 
  else include neutron-opendaylight-l3.yaml
  SDN_IMAGE = 'opendaylight'
elif sdn_controller == 'opendaylight-external':
  include opendaylight-external
  SDN_IMAGE = 'opendaylight'
elif sdn_controller == 'onos';
  if sfc: include onos_sfc.yaml
  else: include onos.yaml
  SDN_IMAGE = 'onos'
elif sdn_controller == 'opencontrail':
  fail since we don't support it
elif !sdn_controller:
  if vpp: include neutron-ml2-networking-vpp.yaml
  SDN_IMAGE = 'opendaylight'
if dataplane == 'ovs_dpdk' or dataplane == 'fdio':
  install dpdk kernel modules inside overcloud images
  • if debug is true, set root pw of opnfv-apex inside image(s)
if sfc and dataplace == 'ovs':
  upgrade ovs package inside image
if sdn_controller == 'opendaylight' and odl_version:
  install the correct version of odl inside image
if performance:
  set kernel parameters as txt file per role
  replace options in numa heat templates
  build ipa kernel option ramdisks using kernel param txt files
  replace options in numa heat templates based on [public|private]_network_[compute|controller]_interface 
  if debug: cat numa.yaml
  include numa.yaml
if ceph:
  include storage-environment.yaml
include network-environment.yaml
if ha_enabled:
  check # of control nodes >= 3
  add ' --control-scale [# of control nodes]' to DEPLOY_OPTIONS
else:
  check # of control nodes >= 1    

check # of compute nodes > 0
add ' --compute-scale [# of compute nodes]' to DEPLOY_OPTIONS
add ' --ntp-server [ntp_server]' to DEPLOY_OPTIONS
add ' --control-flavor control --compute-flavor compute'
if virtual:
  add '-e virtual-environment.yaml'
add '-e ${ENV_FILE}' to DEPLOY_OPTIONS

The following chunk of commands are run as the stack user on the undercloud:

if !tacker:
  disable tacker in ENV_FILE
create ssh key and insert into ENV_FILE at replace_private_key and replace_public_key

restart swift-proxy

openstack overcloud image upload
openstack baremetal import --json instackenv.json

if performance options are set:
  for each role in performance roles:
    create ramdisk from $ROLE-ironic-python-agent.initramfs
    create image from $ROLE-overcloud-full.qcow2, using old kernel and IPA ramdisk
    if role == controller:
      set image name in opnfv-environment.yaml
      set flavor name in opnfv-environment.yaml
    if role == compute:
      set image name in numa.yaml
      set flavor name in numa.yaml
    if role == blockstorage:
      set image name in numa.yaml
      for node in $nodes; do
    if ironic node-show $node | grep profile:${role}:
      ironic node-update $node replace driver_info/deploy_ramdisk=${RAMDISK_ID}

if !virtual:
  openstack baremetal introspection bulk start
  if root_disk_list:
    openstack baremetal configure boot --root-device=$root_disk_list
  else:
    openstack baremetal configure boot

for flavor in baremetal control compute:
  if $flavor in openstack flavor list:
    openstack flavor delete $flavor
  openstack flavor create --id auto --ram 4096 --disk 39 --vcpus 1 $flavor

openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" baremetal
openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" --property "capabilities:profile"="control" control
openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" --property "capabilities:profile"="compute" compute
  • Add all dns_servers to neutron tenant,external and storage subnets using neutron subnet-update
  • Set CloudDomain to domain_name in $ENV_FILE

And finally... put openstack overcloud deploy --templates $DEPLOY_OPTIONS --timeout 90 into deploy_command

  • If interactive is set, prompt user
  • ssh to undercloud as stack user and run deploy_command
  • while deploy is running, run openstack stack failures list overcloud --long
  • if dpdk: once deploy is complete, ssh to each compute node and run dpdk tests
  • if debug: print endpoints, services and cinder quota show

configure_post_install

  • copy the virthost root ssh pub key to all overcloud nodes

Give the virthost an address on admin and external networks if it doesn't already have one:

public_network_ipv6 = false
check admin and external (if enabled) ovs bridges to see if they have IP addresses.
for network in admin, external:
  if no ip:
    ovs_ip = last ip in $network_overcloud_ip_range
    detect ipv4 or ipv6 from ip 
    if ipv6:
      echo 0 > /proc/sys/net/ipv6/conf/$network_iface/disable_ipv6
      if  network == external:
        public_network_ipv6 = true
    ip addr add ovs_ip/network_cidr dev network_iface
    ip link set up network_iface
if dpdk:
  for node in compute_nodes:
    ifup br-phy; sysctl restart neutron-ovs-agent

On undercloud...

# configure external network
if external_nic_mapping_compute_vlan is set but not native:
  neutron net-create external  --router:external=True --tenant-id [$service_tenant_id] --provider:network_type vlan --provider:segmentation_id ${external_nic_mapping_compute_vlan} --provider:physical_network datacentre
else:
  neutron net-create external --router:external=True --tenant-id [$service_tenant_id] --provider:network_type flat --provider:physical_network datacentre
fi
if external_network_ipv6
  neutron subnet-create --name external-net --tenant-id [$service_tenant_id] external --ip_version 6 --ipv6_ra_mode slaac --ipv6_address_mode slaac --gateway ${external_gateway} --allocation-pool start=${external_floating_ip_range%%,*},end=${external_floating_ip_range##*,} ${external_cidr}
elif external in enabled_network_list
  neutron subnet-create --name external-net --tenant-id [$service_tenant_id] --disable-dhcp external --gateway ${external_gateway} --allocation-pool start=${external_floating_ip_range%%,*},end=${external_floating_ip_range##*,} ${external_cidr}
else
  # we re-use the introspection range for floating ips with single admin network
  neutron subnet-create --name external-net --tenant-id [$service_tenant_id] --disable-dhcp external --gateway ${admin_gateway} --allocation-pool start=${admin_introspection_range%%,*},end=${admin_introspection_range##*,} ${admin_cidr}
fi

# Remove nonfunctional endpoints
openstack endpoint delete $sahara_endpoint_id
openstack service delete $sahara_service_id
openstack endpoint delete $swift_endpoint_id
openstack service delete $swift_service_id

# Set hugepage flavor key for dpdk and fdio
if dataplane == 'fdio' or 'dpdk'
  for all flavors:
    nova flavor-key $flavor set hw:mem_page_size=large

# Configure congress datasources
for service in nova, neutronv2 , ceilometer, cinder, glancev2, keystone:
  if nova:
    account for micro-version
  openstack congress datasource create service 
openstack congress datasource create doctor

On Virthost:

if virtual or !test_overcloud_connectivity and public_network_ipv6 != true:
  if external in enabled_network_list:
    nat_cidr = external_cidr
  else:
    nat_cidr = admin_cidr
  configure_undercloud_nat $nat_cidr

if sfc:
  on all overcloud nodes:
    ifconfig br-int up
    ip route add 123.123.123.0/24 dev br-int

if vsperf:
  on compute0:
    cd /var/opt/vsperf/systems && ./build_base_machine.sh

collect deployment logsA
print dash url, undercloud ip, and opnfv-util usage

if ha:
  print any pcs status failures on controller0
  pcs ban congress on controller-1 and 2, then restart congress on 0

onos_update_gw_mac

On virthost:
  GW_MAC=$(arping ${GW_IP} -c 1 -I br-external | grep -Eo '([0-9a-fA-F]{2})(([/\s:-][0-9a-fA-F]{2}){5})')

On Controller0:
  /opt/onos/bin/onos "externalgateway-update -m ${GW_MAC}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment