Skip to content

Instantly share code, notes, and snippets.

@timflannagan
Created May 26, 2020 19:53
Show Gist options
  • Save timflannagan/64446cbc3b4c29051419981399dfa852 to your computer and use it in GitHub Desktop.
Save timflannagan/64446cbc3b4c29051419981399dfa852 to your computer and use it in GitHub Desktop.

TODO

  • Investigate upgrading Metering when using the Manual approval strategy.
  • Investigate upgrading Metering when the newer, 4.5 channel is available and Automatic approval strategy has been configured.
  • Verify the Automatic approval strategy behavior when a 4.4 cluster is upgraded to 4.5
  • Map out any scenarios where a Metering upgrade could fail
  • What's the rollback process - what happens when the process of upgrading Metering to 4.5 fails, and we need to rollback to 4.4? -- Does OLM track the previous CSV version
  • Flesh out what are sufficient checks to ensure a Metering installation is "healthy" - this translates to post-upgrade checks as well.
  • Investigate the openshift-docs k8s custom resource syntax and conform to that standard -- Ping Kevin L. about the syntax, e.g. ReportDataSource vs. ReportDataSource vs. report data source. -- Investigate Metering vs. metering vs. Metering Operator
  • Investigate monitoring last seen (or last timestamp modified) for post-upgrade debuggability/verification checks
  • Investigate getting the MeteringConfig status field output w/o using an external tool, like jq.
  • Investigate what are good post-upgrade k get reportdatasource column verification checks to ensure post-upgrade success.
  • Investigate if there are any good, simple post-upgrade Report verification checks
@timflannagan
Copy link
Author

We could instead do something like this:

tflannag@localhost operator-framework [] ▶ oc get meteringconfig operator-metering -o=jsonpath='{.status.conditions[?(@.type=="Invalid")].message}'
"Invalid configuration for non-OKD distributions: You must set the reporting-operator.spec.config.prometheus.url."
tflannag@localhost operator-framework [] ▶ 

Essentially, while we wait for Metering to roll out, watch for changes to the MeteringConfig custom resource to ensure that no error has been encountered in the Ansible role.

@lbarbeevargas
Copy link

lbarbeevargas commented Jun 1, 2020

From the OpenShift documentation guidelines:

An Operator’s full name must be a proper noun, with each word initially capitalized. If it includes a product name, defer the product’s capitalization style guidelines.

Do not provide examples which use jq. Examples should use a templating engine that is provided with oc, like jsonpath. See (https://bugzilla.redhat.com/show_bug.cgi?id=1764726#c6) for more information.

@lbarbeevargas
Copy link

Updated Timeline:

  • Release Notes - Call for Last Review - 6/29
  • Localization Doc Freeze - 7/2 
  • Doc Freeze - 7/2
  • Release Notes - Final Deadline - 7/2
  • 4.5 GA - 7/9

@timflannagan
Copy link
Author

timflannagan commented Jun 4, 2020

A good debug check for metering installations

For debugging a failed Metering database (creation):

oc -n openshift-metering get storagelocations -o json | jq '.status'

If that status field is empty, we can make the inference that reporting-operator cannot properly communicate with Hive, or there's an issue with Hive server and Hive metastore.

Note: not a great check for upgrades as the reporting-operator is not going to re-process this resource if the status field is non-empty.

@timflannagan
Copy link
Author

Lindsey: something that would be useful to actually go through the upgrade process and pull out more concrete data that we can include in the "Procedure" in the Metering upgrade, in the context of the OCP console.

@timflannagan
Copy link
Author

timflannagan commented Jun 4, 2020

Other notes:

  • Figure out a decent Report that ensures that Metering is still functioning as intended, after the upgrade process.
  • Are there any further measures we need to document past creating a Report, there's new data in that report, you can view the report data?
  • It's somewhat of a poor user experience having to switch from console view, to CLI, to back to console, etc. Is there a way to alleviate this, e.g. we only upgrade Metering through the CLI, or console, for consistency sake.
  • Sync up again later next week, try to get closer to a final draft, such that Peter (pruan) can start reviewing.
  • Try to push for the week of the 15th to try and wrap this up - check-in with group lead (bparees) if they also need to review content.
  • I need to start working towards release notes.

@timflannagan
Copy link
Author

Since the last sync:

  • Added quick write-up for release-4.5 notes: https://gist.github.com/timflannagan1/9cd998945a2521b2bcbd1db86904322e
  • Lindsey added documentation and examples around tracking events. Dial down on the language used and whether or not it's too generalistic.
  • Investigate documenting the MeteringConfig status. This is useful for tracking down errors that may have occurred in the Ansible role while rolling out the Metering stack.
  • Another open question is what's the most consumable way to track the progress of the Metering operator stack. Traditionally, this would be via events, but we need to double-check what's the more reliable implementation under the hood.
  • Figure out a decent Report that ensures that Metering is still functioning as intended, after the upgrade process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment