Skip to content

Instantly share code, notes, and snippets.

Last active Aug 29, 2017
What would you like to do?

GSoC 2017 Final Report

Shogun Detox II: Codebase improvements and finalization of the new Tags and Serialization frameworks

Student: weijie Lin   

Organization: The Shogun Toolbox  

Mentors: Viktor Gal Fernando Iglesias Garcia


Every line of code in SHOGUN has a long history and have gone through many brains and hands. This made SHOGUN what it is today: a powerful toolbox with a lot of features. But most of the code has been written by researchers for their studies. Usually, the focus is on "getting things done", proving awesome ideas and optimize them "as fast as possible". As a drawback, people didn't care too much about software engineering aspects. In addition, lots of new technologies have shown up since some parts of the code have been written, which allows us to do even cooler things with less code now. I expect to improve the maintainability, stability, and beauty of Shogun by using the C++1x feature and add more unit tests for it.

Table of Contents

Replace internal data structures with STL


We want to use more STL in shogun to replace those templates we defined before, so we don't need to do the memory manage by ourself and can make our codebase more readable and robust. The first step of this issue it to use std::vector to replace DynArray. One of the advantages of std::vector is there are a bunch of inplace vector operators and algorithms for the standard container can solve a lot of time.  

List of commits

Pull Request Description
 #3832    Replace DynArray with std::vector  
 #3833    Add unit test for DynamicArray  
 #3852    Add serializable test for DynamicArray  

Random Refactor


In Shogun, we use global random(sg_rand) to unify random seed between different modules and make sure unit test have the same output every time. But it may cause some problems if you change the seed in somewhere and you forget about it. Also, if you need to set sg_rand every time to make sure you have a specific random seed and you know what it is. Furthermore, to make codebase more maintainable, we are going to use the c++11 random framework to replace CRandom in Shogun.

List of commits

 Pull Request    Description  
 #gist of stander random benchmark a benchmark between different random engine in C++11
 #3888    get rid of the global random  
 #3906    remove random functions in CMath

Big Input


We use type index_t in for loop and size and map it to int32_t in shogun before. But now we need bigger index size which means the index_t will be map to int64_t. So basically speaking, we need to change the index_t mapping to int64_t and make sure everything goes well. But actually, when things comes to index and size, we are not strictly use index_t in those places. Sometimes we use index_t of course, but sometimes we use int32_t directly. So we need to use index_t in these places to replace int32_t.

List of commits

 Pull Request    Description  
 #3960    for int32_t replacement  

Future Plans

The STL feature has a long way to go and I think we can back to it after the parameters register can handle the std::vector. And for the random refactor and big input,  we need to handle those unit test and meta test to verify if those refactors wouldn't be broken anything. After GSoC I plan also to work on the tasks I could not complete during this months because I think I can learn a lot of things from it and that's my responsibility as a GSoC student.

Other Contributions

 Pull Request    Description  
#3866 fix the broken GMM
#3857 make BinaryLabels can save the values parameter
#3887 add error message when features are empty
#3821 use SGVector instead of plain pointer in GMM
#3876 fix the leaking DynamicArray unittest
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment