Skip to content

Instantly share code, notes, and snippets.

@micmn

micmn/final.md Secret

Last active September 12, 2018 09:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save micmn/e47c3d8528643ddba27928bea3ca5db0 to your computer and use it in GitHub Desktop.
Save micmn/e47c3d8528643ddba27928bea3ca5db0 to your computer and use it in GitHub Desktop.
GSoC 2017 Final Report: Fundamental ML

GSoC 2017 Final Report: Fundamental ML

Name: Michele Mazzoni

Organization: Shogun Machine Learning Toolbox

Mentor: Viktor Gal


Overview

This page contains links to the work I've done for Shogun Machine Learning Toolbox as part of my GSoC project. As a quick overview the main focus of the project was to improve the library's efficiency, this initially consisted of porting a few algorithms to the new linalg backend, to do that many linear algebra operations have been implemented (wrapping Eigen3) but doing that resulted in overloading the linalg backend due to its design that hence required some refactoring. Second, a script that generate serialization tests for models that requires training has been implemented for a few classes of models (linear machines, multiclass machines, kernel machines). The last period has been spent working on a broad features refactor motivated by making possible multi-threaded cross-validation and providing a new interface for writing algorithms through iterators, this included both outlining its design and starting to implement the changes. More details can be found in the blog posts.


Blog posts


LDA/KernelPCA

PR Description
3826 Remove duplicate code in LDA/FisherLDA and port the solvers to use linalg
3842 Port KernelPCA to use linalg and refactor unit test
3901 Refactor (Fisher)LDA solvers into separate class
3903 Integration testing data for LDA binary classification

Linalg

PR Description
3830 Add to linalg trace, zero, identity, add_col_vec
3837 Fix add_col_vec compatibility with eigen 3.3.x
3843 Add linalg methods needed by FisherLDA and KernelPCA (CPU-only)
3846 Split Eigen3's linalg backend into header and implementation.
3865 Split eigen backend implementation
3935 Set LAPACK's dsyevr as default symmetric eigensolver + fixes

Trained models serialization tests

PR Description
3751 Trained models serialization test
3856 Fix feats/labels memory management in testing environment
3863 Fix memleak in GaussianCheckerboard
3864 Create test environments in main()
3923 Disable trained model serialization test for gpl files when building with bsd license
3933 Add KernelMachines to generated serialization tests
3941 Disable generated serialization tests if HDF5 is not installed
3942 Add Multiclass machines to generated serialization tests

Features refactor

  • On-going on features/features-refactor branch

  • Features design document: gist

PR Description
3968 Implement DotIterator, SGMatrix/Vector iterators
3970 Implement Features::view()/preprocess() + related changes

DotFeatures improvements

PR Description
3907 Rename static method DotFeatures::get_mean -> compute_mean
3908 Vectorize DotFeatures covariance/mean calculation
3920 Add option to compute DotFeatures covariance by matrix product

Miscellanous

PR Description
3918 Fix uninitialized memory read in AveragedPerceptron
3924 Change SGMatrix::size() to signed integer to be consistent with index_t
3959 Fix SubsetStack copy ctor and dtor memory error
3854 Untemplated matrix prototype (on-going on features/linalg_untemplated branch)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment