We’re missing the cpu_percent
calculation for the CPU.tsx feature
We need to add errorCounts
for each worker. It looks like currently we only give the count of errors for the whole node and not for each worker.
Same for logCounts
In addition to pulling in the new GPU monitoring code, we will also need to supply the full resource slots rather than a mapping from type of resource to quantity as we do now.
We are missing the rayletInfo
field that provided some debugging information on a per-node basis.
This needs more help. We basically need to capture all the information that currently exists about actors from the existing rayletStats
endpoint in order to be able to run our logical view off the Ant API. I’m going to be switching around things like the naming conventions to camel case, but we will need to add some kind of new /actors
endpoint to the new dashboard at some point. I think it might make sense for me to do that once this PR you posted is merged.