This proposal is for Gearpump end-to-end integration test. For more information, please track issue 1243.
Gearpump has some integration tests. But tests are frequently failed on Travis-CI by unrelated reasons. So currently, the integration tests are performed manually and spontaneously. The test effort is very high and not plausible. With the increase of project complexity, any sightly code change might break the build, if we do not test the build entirely. The major challenge of creating automated integration tests is to setup a "Gearpump on Hadoop cluster" (AUT, application under test) in an easy way.
TL;DR: Create a scalable Gearpump cluster using Docker.
The long version: We will create a Docker image for Gearpump, so that we can start a set of Docker containers to build up a Gearpump at any scale level instantly. We can do destructive operations to the test cluster (e.g. kill a worker, disconnect the network) without breaking our real machine. And good reason is that Travis-CI supports Docker service.
Here are the major items required:
- Build a Docker image for Gearpump like this. The key thing is the
init_script
. As Gearpump has master and worker roles. Theinit_script
will specify, whether the container should start Gearpump as master or a worker. - Create a test driver for integration tests. The test driver is a black box. Developer will treat the test driver as a real Gearpump cluster. The test driver will manage Docker containers.
Prerequisites:
- CentOS 7 64-bit (with Linux Kernel 3.10.x or higher)
- Install Docker (TBD: add doc of docker basics, proxy settings, dockerui)
We will create Docker image with Gearpump like this.
Not in this scope, but we will do in next step is that, we will create more Docker images to simulate other test environments by considering these aspects:
- Non-HA; HA
- Basic Authz; Kerberos Authz
The test driver will actually execute Docker commands to manage a real Gearpump cluster. The test driver should expose a set of operations to test cases.
- Valid operations:
- Start/Stop a cluster
- Add/Remove a worker
- Query components runtime information
- Submit/Kill application
- Destructive operations:
- Kill prcoess
- Block network communication
class GearpumpTestCluster {
def start(masterNum: Integer, workerNum: Integer);
def stop();
def getGearpumpClient: GearpumpClient
def getMasters: Array[String]
def killMaster(masterAddress: String)
def killWoker(workerAddress: String)
...
}
Command to start a single node Gearpump cluster. Dashboard and restapi will be exposed at http://127.0.0.1:8090
.
docker run -d -p 8090:8090 --name master0 -i gearpump/gearpump
Command to stop a running Gearpump cluster:
docker stop master0
Command to start 2 worker instance (not implemented yet). Worker instance will communicate with master automatically, there is no need to specify the hostname or port of master.
docker run -d --name worker0 -i gearpump/gearpump
docker run -d --name worker1 -i gearpump/gearpump
Command to start worker 2 (not implemented yet)
docker stop worker1
Command to retrieve the number of workers
curl http://127.0.0.1:8090/api/v1/workers
Command to retrieve the master status
curl http://127.0.0.1:8090/api/v1/master
The test driver will have to stop all docker containers, when tearDown
is called.
Update #1: The proposal is to define a mini Gearpump cluster for testing. How the test cases are organized, will not be a part of the design.
We need to define two test suites.
Checks, whether designed features behave expectedly. Test cases are put into different test categories. Some test cases might only be enabled for particular test environment. For instance, YARN related tests will only be performed on "Gearpump on Hadoop-YARN cluster".
Here is a draft of test category. Test cases are fake.
- Core Spec
- Test case #1: All ports serve expected
- Test case #2: Query service component status, etc.
- Test case #N: ...
- Stability Spec
- Test case #1: Kill an executor and wait for recover
- Test case #2: Kill an application and wait for recover
- Test case #N: ...
- Example Spec
- Test case #1: Word count related
- Test case #2: Storm related
- Scalability Spec
- Test case #1: ...
- HA Spec
- Test case #1: ...
- Dashboard Spec
- Test case #1: ...
###Regression Test Ensures, no regression happens. Every test case has an issue id. Every test case will be put into one or more test categories.
Please feedback