newkek/README.md Secret

## README.md

      
    Raw
  

              README.md
            
          
    Observations

glossary:

"p": "parallelism" setting in circle CI
"i": "instance type" setting in circle CI
"r": number of junit runners, configured via -Dtest.runners

JVM dtests


p=100 doesn't work for JVM dtests, and is not necessary. Seems that the failures we get are caused by a problem in the Circle CI setup / infrastructure. However there are still some flaky tests in those at lower p.
JVM dtests don't work when setting r > 1
current config: (p=1, r=1, i=medium): 20mn
best: (p=25, r=1, i=xlarge): 4mn
good compromise time config: (p=10, r=1, i=large): 6mn

Unit tests


r value needs to be adjusted manually, number of optimal runners doesn't get automatically adjusted to instance type. Using r=nb-of-cores is optimal.
current config: (p=4, r=1, i=medium): 20mn
best time config: (p=25, r=4, i=xlarge): 3mn45s
good compromise config: (p=10, r=4, i=large): 5mn30s

is equivalent in time to: (p=25, r=2, i=medium)


Python dtests


i=medium causes a lot of failures (due to resources necessary to run the tests)
running at i=large or i=xlarge cause much less failures. Only failures are flaky tests (sometimes pass, sometimes don't)
HEAP sizes need to be increased when increasing instance type

but not too much otherwise more failures. At i=xlarge, MAX_HEAP and HEAP_NEW should not be higher than 2048 and 512 respectively, otherwise timeout errors occur.


one account cannot run many python dtests at p=100 simultaneously. Between 2 and 3 p=100 runs ever run simultaneously.
current config: (p=4, i=medium): hours? - lots of failures
best time config: (p=100, i=xlarge): 19mn
good compromise config: (p=50, i=large): 22min

Utest_compression

This job will re-run the unit test suite with commit log compaction enabled. Same recommendation applies as for the Unit test suite.
Stress/Fql unit tests

These tests are not parallelized at the moment, runs fast enough on i=medium.
Long unit tests


Long tests are not parallelized at the moment, runs slowly.
(i=xlarge): 30mn
(i=large): 30mn
(i=medium): 28mn
medium will be enough

Python upgrade tests


Need config update (need JAVA8_HOME set)
run very long, each container downloads all versions of C*, can be optimized
current config: (p=4, i=medium): 3hours+
best time config: (p=100, i=xlarge): 52mn (lots of failures)
would suggest looking into parallelizing the tests better (not have each container download all C* versions separately) before being able to give adequate recommendation for these

Remaining results

Remaining data points gathered during experiments, used to draw conclusions on best compromise configs
JVM dtests


(p=10, r=1, i=xlarge): 5mn35 (not worth the xlarge instance as opposed to large)

Unit tests


(p=25, r=8, i=xlarge): 5mn (not good, for some reason r=8 seemed to cause more failures)

Python dtests


(p=25, i=large): 32mn (could be acceptable, not as good as p=50 but twice less containers)
(p=50, i=xlarge): 25mn (as good as with large instance, xlarge not necessary)
(p=25, i=xlarge): 35mn (not as good as p=50, i=large is better)

Python upgrade tests


(p=25, i=large): 2h46mn
(p=50, i=large): 1h28mn

Summary

The current configuration (in trunk) is mostly set to the minimal configuration that can make the test suites run. Except for the python dtests where it seems like the instances setup lack resources.
To improve build times, for most cases it doesn't seem like we actually need to use xlarge instances, we have seen very similar improvements by upgrading to only large instances instead. With some reasonable parallelism, for example with the python dtests, we can have the full python dtest run in under 30 min if we choose p=50 and i=large.
It turns out that at p=100, we see only minor improvement, not worth using twice more resources.
For the Unit and JVM dtests, current runtimes can be vastly improved by using (p=10, i=large) if we think it is necessary.
In the details above we list for each test suite which configuration would be a "best compromise" config that could bring significant test suite time improvement, while not using the "bulldozer" config because unnecessary.
For the remainder of the tests (excluding the upgrade tests), using the minimal configuration gives reasonable run times.