Skip to content

Instantly share code, notes, and snippets.

@newkek

newkek/README.md Secret

Last active March 30, 2020 19:19
Show Gist options
  • Save newkek/bb79dccbe7d2f5e41b2a3daac3858fde to your computer and use it in GitHub Desktop.
Save newkek/bb79dccbe7d2f5e41b2a3daac3858fde to your computer and use it in GitHub Desktop.
C* CircleCI configs tests

Observations

glossary:

  • "p": "parallelism" setting in circle CI
  • "i": "instance type" setting in circle CI
  • "r": number of junit runners, configured via -Dtest.runners

JVM dtests

  • p=100 doesn't work for JVM dtests, and is not necessary. Seems that the failures we get are caused by a problem in the Circle CI setup / infrastructure. However there are still some flaky tests in those at lower p.
  • JVM dtests don't work when setting r > 1
  • current config: (p=1, r=1, i=medium): 20mn
  • best: (p=25, r=1, i=xlarge): 4mn
  • good compromise time config: (p=10, r=1, i=large): 6mn

Unit tests

  • r value needs to be adjusted manually, number of optimal runners doesn't get automatically adjusted to instance type. Using r=nb-of-cores is optimal.
  • current config: (p=4, r=1, i=medium): 20mn
  • best time config: (p=25, r=4, i=xlarge): 3mn45s
  • good compromise config: (p=10, r=4, i=large): 5mn30s
    • is equivalent in time to: (p=25, r=2, i=medium)

Python dtests

  • i=medium causes a lot of failures (due to resources necessary to run the tests)
  • running at i=large or i=xlarge cause much less failures. Only failures are flaky tests (sometimes pass, sometimes don't)
  • HEAP sizes need to be increased when increasing instance type
    • but not too much otherwise more failures. At i=xlarge, MAX_HEAP and HEAP_NEW should not be higher than 2048 and 512 respectively, otherwise timeout errors occur.
  • one account cannot run many python dtests at p=100 simultaneously. Between 2 and 3 p=100 runs ever run simultaneously.
  • current config: (p=4, i=medium): hours? - lots of failures
  • best time config: (p=100, i=xlarge): 19mn
  • good compromise config: (p=50, i=large): 22min

Utest_compression

This job will re-run the unit test suite with commit log compaction enabled. Same recommendation applies as for the Unit test suite.

Stress/Fql unit tests

These tests are not parallelized at the moment, runs fast enough on i=medium.

Long unit tests

  • Long tests are not parallelized at the moment, runs slowly.
  • (i=xlarge): 30mn
  • (i=large): 30mn
  • (i=medium): 28mn
  • medium will be enough

Python upgrade tests

  • Need config update (need JAVA8_HOME set)
  • run very long, each container downloads all versions of C*, can be optimized
  • current config: (p=4, i=medium): 3hours+
  • best time config: (p=100, i=xlarge): 52mn (lots of failures)
  • would suggest looking into parallelizing the tests better (not have each container download all C* versions separately) before being able to give adequate recommendation for these

Remaining results

Remaining data points gathered during experiments, used to draw conclusions on best compromise configs

JVM dtests

  • (p=10, r=1, i=xlarge): 5mn35 (not worth the xlarge instance as opposed to large)

Unit tests

  • (p=25, r=8, i=xlarge): 5mn (not good, for some reason r=8 seemed to cause more failures)

Python dtests

  • (p=25, i=large): 32mn (could be acceptable, not as good as p=50 but twice less containers)
  • (p=50, i=xlarge): 25mn (as good as with large instance, xlarge not necessary)
  • (p=25, i=xlarge): 35mn (not as good as p=50, i=large is better)

Python upgrade tests

  • (p=25, i=large): 2h46mn
  • (p=50, i=large): 1h28mn

Summary

The current configuration (in trunk) is mostly set to the minimal configuration that can make the test suites run. Except for the python dtests where it seems like the instances setup lack resources.

To improve build times, for most cases it doesn't seem like we actually need to use xlarge instances, we have seen very similar improvements by upgrading to only large instances instead. With some reasonable parallelism, for example with the python dtests, we can have the full python dtest run in under 30 min if we choose p=50 and i=large. It turns out that at p=100, we see only minor improvement, not worth using twice more resources.

For the Unit and JVM dtests, current runtimes can be vastly improved by using (p=10, i=large) if we think it is necessary.

In the details above we list for each test suite which configuration would be a "best compromise" config that could bring significant test suite time improvement, while not using the "bulldozer" config because unnecessary.

For the remainder of the tests (excluding the upgrade tests), using the minimal configuration gives reasonable run times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment