Skip to content

Instantly share code, notes, and snippets.

@jodh-intel
Created May 25, 2021 15:33
Show Gist options
  • Save jodh-intel/0ee54d41d2a803ba761e166136b42277 to your computer and use it in GitHub Desktop.
Save jodh-intel/0ee54d41d2a803ba761e166136b42277 to your computer and use it in GitHub Desktop.
Kata Containers tracing status update (2021-05-25)

Background

Chelsea and I have been looking at adding end-to-end openTelemetry tracing support to Kata Containers.

Components

With Kata 2.x's simpler architecture, we only need tracing support in:

  • hypervisor (ideally)
  • runtime
  • agent

Note: Not considering hypervisor tracing at the present time.

Status

Runtime

Good progress but still need to a set of consistent and standarised tracing span tags.

Agent

"Beset by problems" (mostly resolved).


Agent problems (a selection)


  • Rebase hell
    • Fast pace of change (good and bad).
    • Testing tracing changes is slow.
    • Lots and lots and lots and lots of rebasing...
    • Rewrites for API changes and async agent changes.

  • Conversion of VSOCK exporter and trace forwarder to async (done)

  • Agent shutdown (done)
    • This is only required for tracing.
    • Required a lot of refactoring and we hit a number of bugs on the way.
    • However, shutdown PR landed a code is now much cleaner.
    • ... and the agent can be shut down!

  • Agent shutdown needs a test (99% done)
    • Debugging tracing problems is very hard.
    • Need to know the agent is ending gracefully.
    • Has to be fairly elaborate.
    • Took a long time to write and test.
    • PR raised.
    • Still not passing in CI env ;(

  • issues with rust tracing crates (done)
    • A world of pain :-)
    • The rust tracing landscape is still in a state of flux.

  • Problems with span hierarchy (in progress)
    • Some code cannot be traced without major invasive surgery.
    • Therefore, we register a global tracer.
    • BUT, the means the span hierarchy is difficult to get right.
    • This is not fully resolved yet.

Plan

  • Raise a foundational agent tracing PR this week
    • Code is not that useful currently
      (due to the span hierarchy issue).
    • However, worth landing this now:
      by landing the basics, we can ratchet up the support iteratively (aka I can avoid wasting lots of time constantly rebasing and re-testing!)
  • Send tracing summary to the mailing list.

Request

  • "Call to arms"
  • Developing and testing tracing is very time consuming.
  • Need help from community to speed up progress.
  • Volunteer now! ;)
  • Ideally, we could use "follow the sun" to speed landing this features.

Further details

See the new GitHub project.


Questions?

@jodh-intel
Copy link
Author

Presented today at the Kata Containers Architecture Committee meeting.

To view as HTML, I ran this:

$ infile="kata-containers-tracing-status.md"
$ outfile="/tmp/tracing.html"
$ pandoc -s --metadata date="25 May 2021" --metadata author="James O. D. Hunt" --metadata title="Kata Containers tracing update" -f markdown -t revealjs -o "$outfile" "$infile" -V revealjs-url=https://unpkg.com/reveal.js@3.9.2/
$ "$BROWSER" "$outfile"

@jodh-intel
Copy link
Author

Doc referenced in Kata Mailing List message: http://lists.katacontainers.io/pipermail/kata-dev/2021-May/001934.html.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment