deroneriksson/systemml-release-checklist.md

## systemml-release-checklist.md

      
    Raw
  

              systemml-release-checklist.md
            
          
    Release Candidate Version: 0.10.0-incubating-rc1
Release Candidate Checklist


      Item Status Notes
    
  
      All Artifacts and Checksums Present
      Fix
      Not all artifacts have md5 checksums. See here. Also, the previous release has sha1 files in addition to asc and md5 files.
    
    
      Release Candidate Build
      Windows
      
      
      OS X
      Pass
      
    
      Linux
      
      
      Test Suite Passes
      Windows
      
      
      OS X
      Pass
      (DE will re-verify)
    
    
      Linux
      Pass
      All daily tests from May 18 to May 24 have passed. See here.
    
    
      All Binaries Execute
      Pass
      Verified on OS X. (DE will re-verify)
    
    
      Check LICENSE and NOTICE Files
      Pass
      (DE will re-verify)
    
    
      Src Artifact Builds and Tests Pass
      
      
      Single-Node Standalone
      Windows
      
      
      OS X
      Pass
      
    
      Linux
      
      
      Single-Node Spark
      Pass
      
    
      Single-Node Hadoop
      Pass
      
    
      Notebooks
      Jupyter
      
      
      Zeppelin
      
      
      Performance Suite
      Spark
      Pass
      Performance testsuite run on Spark 1.6.1 for data sizes {80MB, 800MB, 8GB, 80GB}, sparse/dense, intercept 0/1/2, and the algorithm classes binomial (Mlogreg, L2SVM, MSVM), multinomial (Mlogreg, MSVM, Naive Bayes), regression (LinregCG, LinregDS, GLM poisson-log, GLM gamma-log, GLM binomal-probit), clustering (Kmeans), and statistics (Univariate, Bivariate). The good news is that there are no compiler/runtime issues and performance is as expected.
    
    
      Hadoop
      
      
Status Options


Pass
Fix
Fail
 

All Artifacts and Checksums Present

Up to Checklist
Verify that each expected artifact is present at https://dist.apache.org/repos/dist/dev/incubator/systemml/ and that each artifact has accompanying
checksums (such as .asc and .md5).
Release Candidate Build

Up to Checklist
The release candidate should build on Windows, OS X, and Linux. To do this cleanly,
the following procedure can be performed.
Clone the Apache SystemML GitHub repository
to an empty location. Next, check out the release tag. Following
this, build the distributions using Maven. This should be performed
with an empty local Maven repository.
Here is an example:
$ git clone https://github.com/apache/incubator-systemml.git
$ cd incubator-systemml
$ git tag -l
$ git checkout tags/0.10.0-incubating-rc1 -b 0.10.0-incubating-rc1
$ mvn -Dmaven.repo.local=$HOME/.m2/temp-repo clean package -P distribution

Test Suite Passes

Up to Checklist
The entire test suite should pass with no errors on Windows, OS X, and Linux.
The test suite can be run using:
$ mvn clean verify

All Binaries Execute

Up to Checklist
Validate that all of the binary artifacts can execute, including those artifacts packaged
in other artifacts (in the tar.gz and zip artifacts). Here is an example of doing a basic
sanity check on OS X.
# build distribution artifacts
mvn clean package -P distribution

cd target

# verify main jar works
java -cp ./lib/*:systemml-0.10.0-incubating.jar org.apache.sysml.api.DMLScript -s "print('hello world');"

# verify SystemML.jar works
java -cp ./lib/*:SystemML.jar org.apache.sysml.api.DMLScript -s "print('hello world');"

# verify standalone jar works
java -jar systemml-0.10.0-incubating-standalone.jar -s "print('hello world');"

# verify src works
tar -xvzf systemml-0.10.0-incubating-src.tar.gz
cd systemml-0.10.0-incubating-src
mvn clean package -P distribution
cd target/
java -cp ./lib/*:systemml-0.10.0-incubating.jar org.apache.sysml.api.DMLScript -s "print('hello world');"
java -cp ./lib/*:SystemML.jar org.apache.sysml.api.DMLScript -s "print('hello world');"
java -jar systemml-0.10.0-incubating-standalone.jar -s "print('hello world');"
cd ..
cd ..

# verify in-memory jar works
echo "import org.apache.sysml.api.jmlc.*;public class JMLCEx {public static void main(String[] args) throws Exception {Connection conn = new Connection();PreparedScript script = conn.prepareScript(\"print('hello world');\", new String[]{}, new String[]{}, false);script.executeScript();}}" > JMLCEx.java
javac -cp systemml-0.10.0-incubating-inmemory.jar JMLCEx.java
java -cp .:systemml-0.10.0-incubating-inmemory.jar JMLCEx

# verify standalone tar.gz works
tar -xvzf systemml-0.10.0-incubating-standalone.tar.gz
cd systemml-0.10.0-incubating-standalone
echo "print('hello world');" > hello.dml
./runStandaloneSystemML.sh hello.dml
cd ..

# verify distrib tar.gz works
tar -xvzf systemml-0.10.0-incubating.tar.gz
cd systemml-0.10.0-incubating
java -cp ../lib/*:SystemML.jar org.apache.sysml.api.DMLScript -s "print('hello world');"

# verify spark batch mode
export SPARK_HOME=/Users/deroneriksson/spark-1.5.1-bin-hadoop2.6
$SPARK_HOME/bin/spark-submit SystemML.jar -s "print('hello world');" -exec hybrid_spark

# verify hadoop batch mode
hadoop jar SystemML.jar -s "print('hello world');"

Check LICENSE and NOTICE Files

Up to Checklist
Each artifact must contain LICENSE and NOTICE files. These files must reflect the
contents of the artifacts. If the project dependencies (ie, libraries) have changed
since the last release, the LICENSE and NOTICE files must be updated to reflect these
changes.
Each artifact should contain a DISCLAIMER file.
For more information, see:

http://incubator.apache.org/guides/releasemanagement.html
http://www.apache.org/dev/licensing-howto.html

Src Artifact Builds and Tests Pass

Up to Checklist
The project should be built using the src (tar.gz and zip) artifacts.
In addition, the test suite should be run using an src artifact and
all tests should pass.
Single-Node Standalone

Up to Checklist
The standalone tar.gz and zip artifacts contain runStandaloneSystemML.sh and runStandaloneSystemML.bat
files. Verify that one or more algorithms can be run on a single node using these
standalone distributions.
Here is an example based on the Quick Start Guide
demonstrating the execution of an algorithm (on OS X).
$ tar -xvzf systemml-0.10.0-incubating-standalone.tar.gz
$ cd systemml-0.10.0-incubating-standalone
$ wget -P data/ http://archive.ics.uci.edu/ml/machine-learning-databases/haberman/haberman.data
$ echo '{"rows": 306, "cols": 4, "format": "csv"}' > data/haberman.data.mtd
$ echo '1,1,1,2' > data/types.csv
$ echo '{"rows": 1, "cols": 4, "format": "csv"}' > data/types.csv.mtd
$ ./runStandaloneSystemML.sh scripts/algorithms/Univar-Stats.dml -nvargs X=data/haberman.data TYPES=data/types.csv STATS=data/univarOut.mtx CONSOLE_OUTPUT=TRUE

Single-Node Spark

Up to Checklist
Verify that SystemML runs algorithms on Spark locally.
Here is an example of running the Univar-Stats.dml algorithm on random generated data.
$ tar -xvzf systemml-0.10.0-incubating.tar.gz
$ cd systemml-0.10.0-incubating
$ export SPARK_HOME=/Users/deroneriksson/spark-1.5.1-bin-hadoop2.6
$ $SPARK_HOME/bin/spark-submit SystemML.jar -f scripts/datagen/genRandData4Univariate.dml -exec hybrid_spark -args 1000000 100 10 1 2 3 4 uni.mtx
$ echo '1' > uni-types.csv
$ echo '{"rows": 1, "cols": 1, "format": "csv"}' > uni-types.csv.mtd
$ $SPARK_HOME/bin/spark-submit SystemML.jar -f scripts/algorithms/Univar-Stats.dml -exec hybrid_spark -nvargs X=uni.mtx TYPES=uni-types.csv STATS=uni-stats.txt CONSOLE_OUTPUT=TRUE

Single-Node Hadoop

Up to Checklist
Verify that SystemML runs algorithms on Hadoop locally.
Based on the "Single-Node Spark" setup above, the Univar-Stats.dml algorithm could be run as follows:
$ hadoop jar SystemML.jar -f scripts/algorithms/Univar-Stats.dml -nvargs X=uni.mtx TYPES=uni-types.csv STATS=uni-stats.txt CONSOLE_OUTPUT=TRUE

Notebooks

Up to Checklist
Verify that SystemML can be executed from Jupyter and Zeppelin notebooks.
For examples, see the Spark MLContext Programming Guide.
Performance Suite

Up to Checklist
Verify that the performance suite located at scripts/perftest/ executes on Spark and Hadoop. Testing should
include 80MB, 800MB, 8GB, and 80GB data sizes.
Item		Status	Notes
All Artifacts and Checksums Present		Fix	Not all artifacts have md5 checksums. See here. Also, the previous release has sha1 files in addition to asc and md5 files.
Release Candidate Build	Windows
	OS X	Pass
	Linux
Test Suite Passes	Windows
	OS X	Pass	(DE will re-verify)
	Linux	Pass	All daily tests from May 18 to May 24 have passed. See here.
All Binaries Execute		Pass	Verified on OS X. (DE will re-verify)
Check LICENSE and NOTICE Files		Pass	(DE will re-verify)
Src Artifact Builds and Tests Pass
Single-Node Standalone	Windows
	OS X	Pass
	Linux
Single-Node Spark		Pass
Single-Node Hadoop		Pass
Notebooks	Jupyter
Notebooks	Zeppelin
Performance Suite	Spark	Pass	Performance testsuite run on Spark 1.6.1 for data sizes {80MB, 800MB, 8GB, 80GB}, sparse/dense, intercept 0/1/2, and the algorithm classes binomial (Mlogreg, L2SVM, MSVM), multinomial (Mlogreg, MSVM, Naive Bayes), regression (LinregCG, LinregDS, GLM poisson-log, GLM gamma-log, GLM binomal-probit), clustering (Kmeans), and statistics (Univariate, Bivariate). The good news is that there are no compiler/runtime issues and performance is as expected.
Performance Suite	Hadoop