Skip to content

Instantly share code, notes, and snippets.

@mmalohlava
Last active March 31, 2017 09:23
Show Gist options
  • Save mmalohlava/717ad7b7441a6ff91b5f0a907482bd5d to your computer and use it in GitHub Desktop.
Save mmalohlava/717ad7b7441a6ff91b5f0a907482bd5d to your computer and use it in GitHub Desktop.
Assignment: Improve H2O PCA

Assignment: Improve H2O PCA

The goal of this assignment is to:

  1. Get familiar with the H2O stack
  2. Make an improvement in H2O

Details

H2O provides implementation of the PCA algorithm which depends on the Jama library. The library is used for several tasks including Singular Value Decomposition (SVD). However, the library also introduces sub-optimal performance.

The idea is to replace Jama SVD commutation by the netlib-java library, and measure performance impact.

Your task is to:

  1. Clone development version of H2O from GitHub https://github.com/h2oai/h2o-3
  2. Build H2O
  3. Explore PCA implementation and how it is used, for example, in JUnit tests.
  4. Replace use of Jama SVD by netlib-java
  5. Measure impact of the change with a single node micro benchmark(s).
  6. Create a pull request with your change

Hints

  • IntelliJ IDEA is a great tool for Java development
  • You can build only Java part of H2O by invoking ./gradlew :h2o-assemblies:main:build
  • JUnit tests are great source of information
  • If you are stuck, please, do not be afraid to contact us

Evaluation criteria

  • Does it work? Can I launch H2O and compute PCA?
  • Was performance measured?
  • How good is the implementation?

Enjoy!

@mathemage
Copy link

mathemage commented Mar 29, 2017

@mmalohlava By netlib-java, do you mean MTJ specifically?

@mathemage
Copy link

mathemage commented Mar 29, 2017

Okay, I started migration to MTJ (based on netlib-java). You can see my progress in branch mathemage-pca-using-netlib-java-svd. Benchmarks are still to be run...

@mathemage
Copy link

@mmalohlava For single node micro benchmarks, is JMH used for h2o-3? Or what do you guys use?

@mathemage
Copy link

mathemage commented Mar 30, 2017

@mmalohlava How are JMH benchmarks done in h2o? I checked the jmh-gradle-plugin tutorial, but still struggling. Do these Groovy commands from tutorial go to h2o-3/build.gradle or is it a different Gradle file?

@mathemage
Copy link

@mmalohlava OIC, one needs to specify a specific subproject as in ./gradlew :h2o-core:jmh Ok, my bad, it's okay now 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment