Shogun Detox II: Codebase improvements and finalization of the new Tags and Serialization frameworks
Student: weijie Lin
Organization: The Shogun Toolbox
Mentors: Viktor Gal Fernando Iglesias Garcia
Every line of code in SHOGUN has a long history and have gone through many brains and hands. This made SHOGUN what it is today: a powerful toolbox with a lot of features. But most of the code has been written by researchers for their studies. Usually, the focus is on "getting things done", proving awesome ideas and optimize them "as fast as possible". As a drawback, people didn't care too much about software engineering aspects. In addition, lots of new technologies have shown up since some parts of the code have been written, which allows us to do even cooler things with less code now. I expect to improve the maintainability, stability, and beauty of Shogun by using the C++1x feature and add more unit tests for it.
-
Replace internal datastructures with STL
-
Description
-
List of commits
-
-
-
Description
-
List of commits
-
-
-
Description
-
List of commits
-
We want to use more STL in shogun to replace those templates we defined before, so we don't need to do the memory manage by ourself and can make our codebase more readable and robust. The first step of this issue it to use std::vector to replace DynArray. One of the advantages of std::vector is there are a bunch of inplace vector operators and algorithms for the standard container can solve a lot of time.
Pull Request | Description |
---|---|
#3832 | Replace DynArray with std::vector |
#3833 | Add unit test for DynamicArray |
#3852 | Add serializable test for DynamicArray |
In Shogun, we use global random(sg_rand
) to unify random seed between different modules and make sure unit test have the same output every time. But it may cause some problems if you change the seed in somewhere and you forget about it. Also, if you need to set sg_rand every time to make sure you have a specific random seed and you know what it is. Furthermore, to make codebase more maintainable, we are going to use the c++11 random framework to replace CRandom in Shogun.
Pull Request | Description |
---|---|
#gist of stander random benchmark | a benchmark between different random engine in C++11 |
#3888 | get rid of the global random |
#3906 | remove random functions in CMath |
We use type index_t
in for loop
and size and map it to int32_t
in shogun before. But now we need bigger index size which means the index_t
will be map to int64_t
. So basically speaking, we need to change the index_t mapping to int64_t and make sure everything goes well. But actually, when things comes to index and size, we are not strictly use index_t in those places. Sometimes we use index_t of course, but sometimes we use int32_t directly. So we need to use index_t in these places to replace int32_t.
Pull Request | Description |
---|---|
#3960 | for int32_t replacement |
The STL feature has a long way to go and I think we can back to it after the parameters register can handle the std::vector. And for the random refactor and big input, we need to handle those unit test and meta test to verify if those refactors wouldn't be broken anything. After GSoC I plan also to work on the tasks I could not complete during this months because I think I can learn a lot of things from it and that's my responsibility as a GSoC student.
Pull Request | Description |
---|---|
#3866 | fix the broken GMM |
#3857 | make BinaryLabels can save the values parameter |
#3887 | add error message when features are empty |
#3821 | use SGVector instead of plain pointer in GMM |
#3876 | fix the leaking DynamicArray unittest |