Skip to content

Instantly share code, notes, and snippets.

@deroneriksson
Last active May 24, 2016 23:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save deroneriksson/3c75abd5bfc629a3d4c15da420f86e3f to your computer and use it in GitHub Desktop.
Save deroneriksson/3c75abd5bfc629a3d4c15da420f86e3f to your computer and use it in GitHub Desktop.
Description of the SystemML release process and validation.

Release Candidate Version: 0.10.0-incubating-rc1

Release Candidate Checklist

ItemStatusNotes
All Artifacts and Checksums Present Fix Not all artifacts have md5 checksums. See here. Also, the previous release has sha1 files in addition to asc and md5 files.
Release Candidate Build Windows
OS X Pass
Linux
Test Suite Passes Windows
OS X Pass (DE will re-verify)
Linux Pass All daily tests from May 18 to May 24 have passed. See here.
All Binaries Execute Pass Verified on OS X. (DE will re-verify)
Check LICENSE and NOTICE Files Pass (DE will re-verify)
Src Artifact Builds and Tests Pass
Single-Node Standalone Windows
OS X Pass
Linux
Single-Node Spark Pass
Single-Node Hadoop Pass
Notebooks Jupyter
Zeppelin
Performance Suite Spark Pass Performance testsuite run on Spark 1.6.1 for data sizes {80MB, 800MB, 8GB, 80GB}, sparse/dense, intercept 0/1/2, and the algorithm classes binomial (Mlogreg, L2SVM, MSVM), multinomial (Mlogreg, MSVM, Naive Bayes), regression (LinregCG, LinregDS, GLM poisson-log, GLM gamma-log, GLM binomal-probit), clustering (Kmeans), and statistics (Univariate, Bivariate). The good news is that there are no compiler/runtime issues and performance is as expected.
Hadoop
Status Options
Pass Fix Fail  

All Artifacts and Checksums Present

Up to Checklist

Verify that each expected artifact is present at https://dist.apache.org/repos/dist/dev/incubator/systemml/ and that each artifact has accompanying checksums (such as .asc and .md5).

Release Candidate Build

Up to Checklist

The release candidate should build on Windows, OS X, and Linux. To do this cleanly, the following procedure can be performed.

Clone the Apache SystemML GitHub repository to an empty location. Next, check out the release tag. Following this, build the distributions using Maven. This should be performed with an empty local Maven repository.

Here is an example:

$ git clone https://github.com/apache/incubator-systemml.git
$ cd incubator-systemml
$ git tag -l
$ git checkout tags/0.10.0-incubating-rc1 -b 0.10.0-incubating-rc1
$ mvn -Dmaven.repo.local=$HOME/.m2/temp-repo clean package -P distribution

Test Suite Passes

Up to Checklist

The entire test suite should pass with no errors on Windows, OS X, and Linux. The test suite can be run using:

$ mvn clean verify

All Binaries Execute

Up to Checklist

Validate that all of the binary artifacts can execute, including those artifacts packaged in other artifacts (in the tar.gz and zip artifacts). Here is an example of doing a basic sanity check on OS X.

# build distribution artifacts
mvn clean package -P distribution

cd target

# verify main jar works
java -cp ./lib/*:systemml-0.10.0-incubating.jar org.apache.sysml.api.DMLScript -s "print('hello world');"

# verify SystemML.jar works
java -cp ./lib/*:SystemML.jar org.apache.sysml.api.DMLScript -s "print('hello world');"

# verify standalone jar works
java -jar systemml-0.10.0-incubating-standalone.jar -s "print('hello world');"

# verify src works
tar -xvzf systemml-0.10.0-incubating-src.tar.gz
cd systemml-0.10.0-incubating-src
mvn clean package -P distribution
cd target/
java -cp ./lib/*:systemml-0.10.0-incubating.jar org.apache.sysml.api.DMLScript -s "print('hello world');"
java -cp ./lib/*:SystemML.jar org.apache.sysml.api.DMLScript -s "print('hello world');"
java -jar systemml-0.10.0-incubating-standalone.jar -s "print('hello world');"
cd ..
cd ..

# verify in-memory jar works
echo "import org.apache.sysml.api.jmlc.*;public class JMLCEx {public static void main(String[] args) throws Exception {Connection conn = new Connection();PreparedScript script = conn.prepareScript(\"print('hello world');\", new String[]{}, new String[]{}, false);script.executeScript();}}" > JMLCEx.java
javac -cp systemml-0.10.0-incubating-inmemory.jar JMLCEx.java
java -cp .:systemml-0.10.0-incubating-inmemory.jar JMLCEx

# verify standalone tar.gz works
tar -xvzf systemml-0.10.0-incubating-standalone.tar.gz
cd systemml-0.10.0-incubating-standalone
echo "print('hello world');" > hello.dml
./runStandaloneSystemML.sh hello.dml
cd ..

# verify distrib tar.gz works
tar -xvzf systemml-0.10.0-incubating.tar.gz
cd systemml-0.10.0-incubating
java -cp ../lib/*:SystemML.jar org.apache.sysml.api.DMLScript -s "print('hello world');"

# verify spark batch mode
export SPARK_HOME=/Users/deroneriksson/spark-1.5.1-bin-hadoop2.6
$SPARK_HOME/bin/spark-submit SystemML.jar -s "print('hello world');" -exec hybrid_spark

# verify hadoop batch mode
hadoop jar SystemML.jar -s "print('hello world');"

Check LICENSE and NOTICE Files

Up to Checklist

Each artifact must contain LICENSE and NOTICE files. These files must reflect the contents of the artifacts. If the project dependencies (ie, libraries) have changed since the last release, the LICENSE and NOTICE files must be updated to reflect these changes.

Each artifact should contain a DISCLAIMER file.

For more information, see:

  1. http://incubator.apache.org/guides/releasemanagement.html
  2. http://www.apache.org/dev/licensing-howto.html

Src Artifact Builds and Tests Pass

Up to Checklist

The project should be built using the src (tar.gz and zip) artifacts. In addition, the test suite should be run using an src artifact and all tests should pass.

Single-Node Standalone

Up to Checklist

The standalone tar.gz and zip artifacts contain runStandaloneSystemML.sh and runStandaloneSystemML.bat files. Verify that one or more algorithms can be run on a single node using these standalone distributions.

Here is an example based on the Quick Start Guide demonstrating the execution of an algorithm (on OS X).

$ tar -xvzf systemml-0.10.0-incubating-standalone.tar.gz
$ cd systemml-0.10.0-incubating-standalone
$ wget -P data/ http://archive.ics.uci.edu/ml/machine-learning-databases/haberman/haberman.data
$ echo '{"rows": 306, "cols": 4, "format": "csv"}' > data/haberman.data.mtd
$ echo '1,1,1,2' > data/types.csv
$ echo '{"rows": 1, "cols": 4, "format": "csv"}' > data/types.csv.mtd
$ ./runStandaloneSystemML.sh scripts/algorithms/Univar-Stats.dml -nvargs X=data/haberman.data TYPES=data/types.csv STATS=data/univarOut.mtx CONSOLE_OUTPUT=TRUE

Single-Node Spark

Up to Checklist

Verify that SystemML runs algorithms on Spark locally.

Here is an example of running the Univar-Stats.dml algorithm on random generated data.

$ tar -xvzf systemml-0.10.0-incubating.tar.gz
$ cd systemml-0.10.0-incubating
$ export SPARK_HOME=/Users/deroneriksson/spark-1.5.1-bin-hadoop2.6
$ $SPARK_HOME/bin/spark-submit SystemML.jar -f scripts/datagen/genRandData4Univariate.dml -exec hybrid_spark -args 1000000 100 10 1 2 3 4 uni.mtx
$ echo '1' > uni-types.csv
$ echo '{"rows": 1, "cols": 1, "format": "csv"}' > uni-types.csv.mtd
$ $SPARK_HOME/bin/spark-submit SystemML.jar -f scripts/algorithms/Univar-Stats.dml -exec hybrid_spark -nvargs X=uni.mtx TYPES=uni-types.csv STATS=uni-stats.txt CONSOLE_OUTPUT=TRUE

Single-Node Hadoop

Up to Checklist

Verify that SystemML runs algorithms on Hadoop locally.

Based on the "Single-Node Spark" setup above, the Univar-Stats.dml algorithm could be run as follows:

$ hadoop jar SystemML.jar -f scripts/algorithms/Univar-Stats.dml -nvargs X=uni.mtx TYPES=uni-types.csv STATS=uni-stats.txt CONSOLE_OUTPUT=TRUE

Notebooks

Up to Checklist

Verify that SystemML can be executed from Jupyter and Zeppelin notebooks. For examples, see the Spark MLContext Programming Guide.

Performance Suite

Up to Checklist

Verify that the performance suite located at scripts/perftest/ executes on Spark and Hadoop. Testing should include 80MB, 800MB, 8GB, and 80GB data sizes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment