Skip to content

Instantly share code, notes, and snippets.

@o11c
Last active October 6, 2023 12:11
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save o11c/ef8f0886d5967dfebc3d to your computer and use it in GitHub Desktop.
Save o11c/ef8f0886d5967dfebc3d to your computer and use it in GitHub Desktop.
Possible test annotations and results

Things that a test can be annotated with:

  • XFAIL(cond): for tests that are known to be buggy.
  • FLAKY(cond): for tests that have nondeterministic bugs that have not been hunted down.
  • SKIP(cond): for tests not applicable to the current platform, that cannot be fixed by installing or configuring dependencies.
  • MISSING(cond): for tests that can't run because of uninstalled or unconfigured dependencies.
  • WIP: for tests you are implementing
  • TIME(cpumin, cpumax, realmax): min/max computation expected for a test. Also real time added in case you sleep or something.

This is my list of possible outcomes from running a single test.

As far as I know, no testing framework supplies all of them.

  • PASS: ordinary test, worked correctly.
  • FAIL: ordinary test, failed for unknown reasons.
  • XFAIL: test known to not pass, still run. Treated as a pass.
  • UPASS: test expected to not pass, but when run actually passed. Treated as a failure.
  • TOLERABLE: a test that isn't what we would prefer, but still good enough to be treated as a pass.
  • FLAKY-{FAIL,PASS}: test known to sometimes pass or not pass. Both treated as a pass to keep down noise.
  • SKIP: test cannot be meaningfully run on this platform. Should not be confused with MISSING, but currently is in the real world. Treated as a pass.
  • MISSING: test cannot currently be run on this platform because of missing dependencies (not installed or possibly not configured to be enabled). Treated as a pass.
  • SETUP-FAIL: some setup stage failed, before the test could actually be run. Travis-CI has a result called "errored". It would sometimes be nice to have this on a more fine-grained level. Treated as a failure, but blamed on the harness code rather than the unit itself.
  • COMPILE-FAIL: A test failed to compile. This is treated as a failure even for tests marked XFAIL or FLAKY. SKIP and MISSING may or may not even try to compile, so might not produce this. This should not be used if you are writing a compiler and want to produce "should-compile" and "should-produce-errors", those are normal PASS/FAIL.
  • WIP-{PASS,FAIL}: test for code that you have marked as currently incomplete but you are working on it. Identical to PASS/FAIL except counted separately for convenience.
  • SLOW: test took longer than was expected. Treated as a pass, but might be an indicator of a speed regression.
  • FAST: test took much less time than expected. Treated as a pass, but might be an indicator of something wrong with the test.
  • TIMEOUT: test took much longer than was expected and was killed by the driver. Treated as a fail.
  • CANCELLED: test was explicitly cancelled by the user. Not treated as a pass or a fail.
  • UNAVAILABLE: test did not exist at the time (only used when looking at historical tests). Not treated as a pass or a fail.

This is a work in progress! It is not as complete as the other files!

Every test framework has a different way of enumerating and naming tests. Some collect names, then run by name. Others only emit the name as you go (which is not ideal).

Some test names refer to code. Others are mere aggregations of other tests. Some refer to both, where the aggregate can fail independently of the subtests.

By necessity, it must be assumed that all names may have children that cannot be named without running the test.

All test driver should provide the following primitives:

  • list-root-test-names
  • list-known-sub-test-names parent.test.name
  • get-test-annotations dotted.test.name
  • run-test-with-children dotted.test.name

and the following non-primitives (which may have more efficient implementations):

  • list-known-recursive-test-names
  • run-all-tests
  • *-matching for all of the above

Test drivers should be implemented recursively, so that list-sub-test-names is just a different driver's list-root-test-names.

The test driver abstraction should be implemented for various existing harnesses: TAP, Google Test, python unittest, ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment