stanleyxu2005/GearpumpCIProposal.md

## GearpumpCIProposal.md

      
    Raw
  

              GearpumpCIProposal.md
            
          
    Gearpump Continuous Integration Proposal

This proposal is for Gearpump end-to-end integration test. For more information, please track issue 1243.
Background

Gearpump has some integration tests. But tests are frequently failed on Travis-CI by unrelated reasons. So currently, the integration tests are performed manually and spontaneously. The test effort is very high and not plausible. With the increase of project complexity, any sightly code change might break the build, if we do not test the build entirely. The major challenge of creating automated integration tests is to setup a "Gearpump on Hadoop cluster" (AUT, application under test) in an easy way.
Approach

TL;DR: Create a scalable Gearpump cluster using Docker.
The long version: We will create a Docker image for Gearpump, so that we can start a set of Docker containers to build up a Gearpump at any scale level instantly. We can do destructive operations to the test cluster (e.g. kill a worker, disconnect the network) without breaking our real machine. And good reason is that Travis-CI supports Docker service.
Here are the major items required:

Build a Docker image for Gearpump like this. The key thing is the init_script. As Gearpump has master and worker roles. The init_script will specify, whether the container should start Gearpump as master or a worker.
Create a test driver for integration tests. The test driver is a black box. Developer will treat the test driver as a real Gearpump cluster. The test driver will manage Docker containers.

Technical Details

Build Docker Image

Prerequisites:

CentOS 7 64-bit (with Linux Kernel 3.10.x or higher)
Install Docker (TBD: add doc of docker basics, proxy settings, dockerui)

We will create Docker image with Gearpump like this.
Not in this scope, but we will do in next step is that, we will create more Docker images to simulate other test environments by considering these aspects:

Non-HA; HA
Basic Authz; Kerberos Authz

Test Driver

The test driver will actually execute Docker commands to manage a real Gearpump cluster. The test driver should expose a set of operations to test cases.

Valid operations:

Start/Stop a cluster
Add/Remove a worker
Query components runtime information
Submit/Kill application


Destructive operations:

Kill prcoess
Block network communication


class GearpumpTestCluster {
    def start(masterNum: Integer, workerNum: Integer);
    def stop();
    def getGearpumpClient: GearpumpClient
    def getMasters: Array[String]
    def killMaster(masterAddress: String)
    def killWoker(workerAddress: String)
    ...
} 

Commands

Command to start a single node Gearpump cluster. Dashboard and restapi will be exposed at http://127.0.0.1:8090.
docker run -d -p 8090:8090 --name master0 -i gearpump/gearpump

Command to stop a running Gearpump cluster:
docker stop master0

Command to start 2 worker instance (not implemented yet). Worker instance will communicate with master automatically, there is no need to specify the hostname or port of master.
docker run -d --name worker0 -i gearpump/gearpump
docker run -d --name worker1 -i gearpump/gearpump

Command to start worker 2 (not implemented yet)
docker stop worker1

Command to retrieve the number of workers
curl http://127.0.0.1:8090/api/v1/workers

Command to retrieve the master status
curl http://127.0.0.1:8090/api/v1/master

The test driver will have to stop all docker containers, when tearDown is called.
Test Plan (Moved out of the scope)

Update #1: The proposal is to define a mini Gearpump cluster for testing. How the test cases are organized, will not be a part of the design.
We need to define two test suites.
Check-list Test

Checks, whether designed features behave expectedly. Test cases are put into different test categories. Some test cases might only be enabled for particular test environment. For instance, YARN related tests will only be performed on "Gearpump on Hadoop-YARN cluster".
Here is a draft of test category. Test cases are fake.

Core Spec

Test case #1: All ports serve expected
Test case #2: Query service component status, etc.
Test case #N: ...


Stability Spec

Test case #1: Kill an executor and wait for recover
Test case #2: Kill an application and wait for recover
Test case #N: ...


Example Spec

Test case #1: Word count related
Test case #2: Storm related


Scalability Spec

Test case #1: ...


HA Spec

Test case #1: ...


Dashboard Spec

Test case #1: ...


###Regression Test
Ensures, no regression happens. Every test case has an issue id. Every test case will be put into one or more test categories.
Conclusion

Please feedback