Skip to content

Instantly share code, notes, and snippets.

@halcyondude
Created May 22, 2018 06:44
Show Gist options
  • Save halcyondude/02cd67aeff500ef1b7c3add75ec6b0b6 to your computer and use it in GitHub Desktop.
Save halcyondude/02cd67aeff500ef1b7c3add75ec6b0b6 to your computer and use it in GitHub Desktop.
Sprint 13 RDO2 status

Rdo Phase 2 Status (Sprint 13)

General Notes

  1. State of RDO Phase 2 at beginning of Sprint 13 was basically broken
  2. Repeated infra issues (RDO docker registery, tripelo-ci infra tenant, images.rdoproject.org timeouts) etc have been a general time sink.
  3. We have a "best effort" methodology whereby ruck/rover are responsible for job monitoring, as well as upkeep for all phases. In practice this has led to starvation for some phases
  4. We have known tecnical debt entering Sprint 13, particularly around bare metal jobs, and how aligned they are w.r.t. how configurataion and job definitions are modelled upstream. We've iterated substantially in our upstream definitions while taking on technical debt in RDO2.
  5. We don't have an explicit mechanism that is identifying changes made upstream that require CI job changes. An example is the changing of cinder configuration from non-containerized to containerized, which is causing master rdo2 BM jobs to fail
  6. Making progress however...see below

Misc issues fixed this sprint (in RDO2)

  1. (fixed) https://bugs.launchpad.net/tripleo/+bug/1772460 rdo2: BM jobs failing b/c concurrent pip installs are failing due to sharing pip cache
  2. (WIP) https://bugs.launchpad.net/tripleo/+bug/1772533 rdo2 BM: configs need updating for master branch to include containerized services
  3. (fixed) envD had been failing for > month, UC machine was wedged and needed hard reboot, and update from centos 7.0 --> 7.5
  4. (merged) https://code.engineering.redhat.com/gerrit/#/c/139243 BMU jobs have moved to be using openstack/quickstart-ha-utils
  5. (merged) https://code.engineering.redhat.com/gerrit/#/c/139404 verbose logging for BM jobs was making logs massive
  6. (merged) https://code.engineering.redhat.com/gerrit/#/c/139133 use QE passed phase 1 puddles for OSP (vs. latest/all puddles)
  7. (merged) https://code.engineering.redhat.com/gerrit/#/c/139138 stop building RDO on RHEL images (master)
  8. (merged) https://code.engineering.redhat.com/gerrit/#/c/138964 rhel 7.4 --> rhel 7.5 for ha baremetal envs
  9. (merged) https://code.engineering.redhat.com/gerrit/#/c/138269 add rhos-13 gate jobs for TQ/TQE
  10. (WIP) additional logging config for BMU https://review.rdoproject.org/etherpad/p/rdo2-collect-logs-customization

Master

TLDR: fs20 is passing, all BM is failing (missing docker config...debt). BMU is also failing

"2018-05-21 18:11:54,926 ERROR: 23311 -- Failed running docker-puppet.py for neutron", 2018-05-21 14:16:17 | "2018-05-21 18:11:54,926 ERROR: 23311 -- Notice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend",

Queens

PLEASE MERGE TO PROMOTE: https://review.rdoproject.org/r/13844 promote: queens rdo2

  • last rdo2 promote: 2018-05-08, 24fd4bd776d47ab956490ff555c7471cb01c0b99_3b49aa87
  • TLDR: last run: fs20 + BMU passed. envB, envD, envD all failed

Current jobs:

Pike

PLEASE MERGE TO PROMOTE: https://review.rdoproject.org/r/13843 promote: pike rdo2

Action items for pike:

  1. remove rdo on rhel jobs. we're not supporting them and there is cost to maintain
  2. (does not block promotion, low priority): https://thirdparty.logs.rdoproject.org/jenkins-tripleo-quickstart-pike-rdo_trunk-baremetal-dell_fc430_envB-single_nic_vlans-212/undercloud/home/stack/overcloud_deploy_post.log.txt.gz#_2018-05-22_05_47_17

Ocata

  • last full green pass: March 9, #141
  • last rdo2 promote: 2018-05-04, 7ef0d5f8c31c87f377d4a2d07d7123f3f2bbf83f_1558157c
  • TLDR: BMU passing, BM fails (2) on pre-introspection validation checks. Note BMU has been stable for a month.

Current jobs:

TQ/TQE Gates (rhos-12/13)

TLDR: WIP - been lower prioirty than promotions / infra

stable branch jobs

  • have not been maintained or touched (that I can tell at all) by any ruck/rover in any sprint.
  • We should perhaps revisit their utility, as invariably we just rebase the stable branches anyway.
  • These jobs could use some team review.
  • IMHO they should be using a dlrnapi promoter as well. (tech debt, cards exist)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment