Name: Michele Mazzoni
Organization: Shogun Machine Learning Toolbox
Mentor: Viktor Gal
This page contains links to the work I've done for Shogun Machine Learning Toolbox as part of my GSoC project. As a quick overview the main focus of the project was to improve the library's efficiency, this initially consisted of porting a few algorithms to the new linalg backend, to do that many linear algebra operations have been implemented (wrapping Eigen3) but doing that resulted in overloading the linalg backend due to its design that hence required some refactoring. Second, a script that generate serialization tests for models that requires training has been implemented for a few classes of models (linear machines, multiclass machines, kernel machines). The last period has been spent working on a broad features refactor motivated by making possible multi-threaded cross-validation and providing a new interface for writing algorithms through iterators, this included both outlining its design and starting to implement the changes. More details can be found in the blog posts.
- Weeks 1-2: Linalg refactor I
- Weeks 3-4: Linalg refactor II
- Weeks 5-6: LDA/KernelPCA improvements
- Weeks 7-8: Trained models serialization tests
- Weeks 9-10: Features refactor
PR | Description |
---|---|
3826 | Remove duplicate code in LDA/FisherLDA and port the solvers to use linalg |
3842 | Port KernelPCA to use linalg and refactor unit test |
3901 | Refactor (Fisher)LDA solvers into separate class |
3903 | Integration testing data for LDA binary classification |
PR | Description |
---|---|
3830 | Add to linalg trace, zero, identity, add_col_vec |
3837 | Fix add_col_vec compatibility with eigen 3.3.x |
3843 | Add linalg methods needed by FisherLDA and KernelPCA (CPU-only) |
3846 | Split Eigen3's linalg backend into header and implementation. |
3865 | Split eigen backend implementation |
3935 | Set LAPACK's dsyevr as default symmetric eigensolver + fixes |
PR | Description |
---|---|
3751 | Trained models serialization test |
3856 | Fix feats/labels memory management in testing environment |
3863 | Fix memleak in GaussianCheckerboard |
3864 | Create test environments in main() |
3923 | Disable trained model serialization test for gpl files when building with bsd license |
3933 | Add KernelMachines to generated serialization tests |
3941 | Disable generated serialization tests if HDF5 is not installed |
3942 | Add Multiclass machines to generated serialization tests |
-
On-going on
features/features-refactor
branch -
Features design document: gist
PR | Description |
---|---|
3968 | Implement DotIterator, SGMatrix/Vector iterators |
3970 | Implement Features::view()/preprocess() + related changes |
PR | Description |
---|---|
3907 | Rename static method DotFeatures::get_mean -> compute_mean |
3908 | Vectorize DotFeatures covariance/mean calculation |
3920 | Add option to compute DotFeatures covariance by matrix product |
PR | Description |
---|---|
3918 | Fix uninitialized memory read in AveragedPerceptron |
3924 | Change SGMatrix::size() to signed integer to be consistent with index_t |
3959 | Fix SubsetStack copy ctor and dtor memory error |
3854 | Untemplated matrix prototype (on-going on features/linalg_untemplated branch) |