- State of RDO Phase 2 at beginning of Sprint 13 was basically broken
- Repeated infra issues (RDO docker registery, tripelo-ci infra tenant, images.rdoproject.org timeouts) etc have been a general time sink.
- We have a "best effort" methodology whereby ruck/rover are responsible for job monitoring, as well as upkeep for all phases. In practice this has led to starvation for some phases
- We have known tecnical debt entering Sprint 13, particularly around bare metal jobs, and how aligned they are w.r.t. how configurataion and job definitions are modelled upstream. We've iterated substantially in our upstream definitions while taking on technical debt in RDO2.
- We don't have an explicit mechanism that is identifying changes made upstream that require CI job changes. An example is the changing of cinder configuration from non-containerized to containerized, which is causing master rdo2 BM jobs to fail
- Making progress however...see below
- (fixed) https://bugs.launchpad.net/tripleo/+bug/1772460 rdo2: BM jobs failing b/c concurrent pip installs are failing due to sharing pip cache
- (WIP) https://bugs.launchpad.net/tripleo/+bug/1772533 rdo2 BM: configs need updating for master branch to include containerized services
- (fixed) envD had been failing for > month, UC machine was wedged and needed hard reboot, and update from centos 7.0 --> 7.5
- (merged) https://code.engineering.redhat.com/gerrit/#/c/139243 BMU jobs have moved to be using openstack/quickstart-ha-utils
- (merged) https://code.engineering.redhat.com/gerrit/#/c/139404 verbose logging for BM jobs was making logs massive
- (merged) https://code.engineering.redhat.com/gerrit/#/c/139133 use QE passed phase 1 puddles for OSP (vs. latest/all puddles)
- (merged) https://code.engineering.redhat.com/gerrit/#/c/139138 stop building RDO on RHEL images (master)
- (merged) https://code.engineering.redhat.com/gerrit/#/c/138964 rhel 7.4 --> rhel 7.5 for ha baremetal envs
- (merged) https://code.engineering.redhat.com/gerrit/#/c/138269 add rhos-13 gate jobs for TQ/TQE
- (WIP) additional logging config for BMU https://review.rdoproject.org/etherpad/p/rdo2-collect-logs-customization
TLDR: fs20 is passing, all BM is failing (missing docker config...debt). BMU is also failing
- BMU OC deploy fail: https://thirdparty.logs.rdoproject.org/jenkins-oooq-master-rdo_trunk-bmu-haa16-lab-float_nic_with_vlans-161/undercloud/home/stack/overcloud_deploy.log.txt.gz
- BM config is not running containerized (envB, envD, envE) https://bugs.launchpad.net/tripleo/+bug/1772533
"2018-05-21 18:11:54,926 ERROR: 23311 -- Failed running docker-puppet.py for neutron", 2018-05-21 14:16:17 | "2018-05-21 18:11:54,926 ERROR: 23311 -- Notice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend",
PLEASE MERGE TO PROMOTE: https://review.rdoproject.org/r/13844 promote: queens rdo2
- last rdo2 promote: 2018-05-08, 24fd4bd776d47ab956490ff555c7471cb01c0b99_3b49aa87
- TLDR: last run: fs20 + BMU passed. envB, envD, envD all failed
Current jobs:
-
2018-05-17 17:17:17, https://trunk.rdoproject.org/centos7-queens/61/15/61152f1f452f02d2f0bccc8e3b3b1695103c4114_ba256d89, current-tripleo-rdo
-
https://rhos-dev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rdo-promote-queens-rdo_trunk/83
- (pass) fs20
- (pass) BMU
- (fail) envB
- (fail) envD
- (fail) envE
PLEASE MERGE TO PROMOTE: https://review.rdoproject.org/r/13843 promote: pike rdo2
-
last rdo2 promote: 2018-05-10, d52ad67500aacdb4c2a1321363bfe87de4e6b518_88c9954e
-
TLDR: this should be promoted, fs020 and 2 BM jobs are green
-
https://rhos-dev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rdo-promote-pike-rdo_trunk/172
- remove rdo on rhel jobs. we're not supporting them and there is cost to maintain
- (does not block promotion, low priority): https://thirdparty.logs.rdoproject.org/jenkins-tripleo-quickstart-pike-rdo_trunk-baremetal-dell_fc430_envB-single_nic_vlans-212/undercloud/home/stack/overcloud_deploy_post.log.txt.gz#_2018-05-22_05_47_17
- last full green pass: March 9, #141
- last rdo2 promote: 2018-05-04, 7ef0d5f8c31c87f377d4a2d07d7123f3f2bbf83f_1558157c
- TLDR: BMU passing, BM fails (2) on pre-introspection validation checks. Note BMU has been stable for a month.
Current jobs:
- https://rhos-dev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rdo-promote-ocata-rdo_trunk/169
- (pass) https://rhos-dev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/oooq-ocata-rdo_trunk-bmu-haa01-lab-float_nic_with_vlans/153
- (fail) https://rhos-dev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/promote-rhel-ocata-rdo_trunk-featureset020-1ctlr_1comp_64gb/40
- failing during modify-images, looks like https://bugs.launchpad.net/tripleo/+bug/1762419/comments/7 (or similar issue), note this is an RDO on RHEL
- (fail) https://rhos-dev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/periodic-ocata-rdo_trunk-featureset020-1ctlr_1comp_64gb/40 UC install failing with timeout to access keystone ?
- (fail) https://thirdparty.logs.rdoproject.org/jenkins-tripleo-quickstart-ocata-rdo_trunk-baremetal-hp_dl360_envE-single_nic_vlans-85/undercloud/home/stack/validations_pre-introspection.log.txt.gz
- (fail) https://thirdparty.logs.rdoproject.org/jenkins-tripleo-quickstart-ocata-rdo_trunk-baremetal-dell_fc430_envB-single_nic_vlans-202/undercloud/home/stack/validations_pre-introspection.log.txt.gz
TLDR: WIP - been lower prioirty than promotions / infra
-
(merged) https://code.engineering.redhat.com/gerrit/#/c/138269 (add rhos-13 gates)
-
most recent gate jobs attempted (failed)
-
most recent osp 12/13 jobs (passed_phase1 puddles), and stable branch updates
- have not been maintained or touched (that I can tell at all) by any ruck/rover in any sprint.
- We should perhaps revisit their utility, as invariably we just rebase the stable branches anyway.
- These jobs could use some team review.
- IMHO they should be using a dlrnapi promoter as well. (tech debt, cards exist)