{
health: <one or more callbacks>
}
def callback(service_name):
return sub_report_data
Full report:
---
service: service name
qos: nil | <0.0-1.0>
health: [passed|warn|fail]
state:
name: [building,installing,churning,blocked,error,up]
duration: <time in state>
blockers: <list of dependencies if state is blocker>
checks:
- name: <health check name>
health: [passed|warn|fail]
message: <text description of check>
data: <structured output data>
Subset report:
Information like the transient state of the service make the most sense to persist in a file or log. Each check could contribute the full report with a subset to be merged:
---
name: db-cxn
qos: 0.99
health: passed
message: "we can talk to the DB"
data:
ping: 0.002s
Ultimately this live in the state server. We have several options for the time being.
-
Collate on the client ie
juju cfs
This would be simple for the purposes of creating a quick and custom utility but would either divide service logic or dictate a fixed merging strategy.
-
Collate on the orchestrator
This would emulate a bit closer to what will ultimately live in core. This method could also facilate custom merging rules that could be defined in the service generation blocks to display service specific views of unit health info.
More precise reporting makes sense for unit by unit inspection, but some sort of service rollup is necessary for general scanning.
We have a few different hammers here:
-
juju ssh
from orchestrator (reconciler) to execute a script -
reap a log from a unit local daemon (or chron) that periodically executes a script
-
network access to a unit local daemon allowing remote triggering of health check execution (or reaping of a log created by periodic execution).
-
a daemon on the unit executes periodically checks and pushes data back to the orchestrator.
juju run
can be ruled out due to it's inability to blockage by
failed hooks, etc.
Most basic function (summary of what's happening on a node) could consist of:
- service "health" hook
- series of reconciler ssh runs to collect json from hook
More sophisticated (more similar to idealized future)
- go daemon on each unit which:
- hooks can report "status" to via a command (similar to proposal)
- periodic execution of health checks and structured logging of results
- result shipping to reconciler
- adhoc execution of check by query
Daemon would be set up and run as part of the OrchestratorRelationHook which would pass any needed information about connecting to the reconciler.