MikeLing/GSoC_2017_Final_Report.md Secret

## GSoC_2017_Final_Report.md

      
    Raw
  

              GSoC_2017_Final_Report.md
            
          
    GSoC 2017 Final Report

Shogun Detox II: Codebase improvements and finalization of the new Tags and Serialization frameworks

Student: weijie Lin   
Organization: The Shogun Toolbox  
Mentors: Viktor Gal Fernando Iglesias Garcia
Abstract

Every line of code in SHOGUN has a long history and have gone through many brains and hands. This made SHOGUN what it is today: a powerful toolbox with a lot of features. But most of the code has been written by researchers for their studies. Usually, the focus is on "getting things done", proving awesome ideas and optimize them "as fast as possible". As a drawback, people didn't care too much about software engineering aspects. In addition, lots of new technologies have shown up since some parts of the code have been written, which allows us to do even cooler things with less code now. I expect to improve the maintainability, stability, and beauty of Shogun by using the C++1x feature and add more unit tests for it.
Table of Contents


Replace internal datastructures with STL


Description


List of commits


Random Refacror


Description


List of commits


Big Input


Description


List of commits


Future Plans  


Other Contributions


Replace internal data structures with STL

Description

We want to use more STL in shogun to replace those templates we defined before, so we don't need to do the memory manage by ourself and can make our codebase more readable and robust. The first step of this issue it to use std::vector to replace DynArray. One of the advantages of std::vector is there are a bunch of inplace vector operators and algorithms for the standard container can solve a lot of time.  
List of commits


Pull Request
Description


 #3832  
 Replace DynArray with std::vector  


 #3833  
 Add unit test for DynamicArray  


 #3852  
 Add serializable test for DynamicArray  


Random Refactor

Description

In Shogun, we use global random(sg_rand) to unify random seed between different modules and make sure unit test have the same output every time. But it may cause some problems if you change the seed in somewhere and you forget about it. Also, if you need to set sg_rand every time to make sure you have a specific random seed and you know what it is. Furthermore, to make codebase more maintainable, we are going to use the c++11 random framework to replace CRandom in Shogun.
List of commits


 Pull Request  
 Description  


 #gist of stander random benchmark
a benchmark between different random engine in C++11


 #3888  
 get rid of the global random  


 #3906  
 remove random functions in CMath


Big Input

Description

We use type index_t in for loop and size and map it to int32_t in shogun before. But now we need bigger index size which means the index_t will be map to int64_t. So basically speaking, we need to change the index_t mapping to int64_t and make sure everything goes well. But actually, when things comes to index and size, we are not strictly use index_t in those places. Sometimes we use index_t of course, but sometimes we use int32_t directly. So we need to use index_t in these places to replace int32_t.
List of commits


 Pull Request  
 Description  


 #3960  
 for int32_t replacement  


Future Plans

The STL feature has a long way to go and I think we can back to it after the parameters register can handle the std::vector. And for the random refactor and big input,  we need to handle those unit test and meta test to verify if those refactors wouldn't be broken anything. After GSoC I plan also to work on the tasks I could not complete during this months because I think I can learn a lot of things from it and that's my responsibility as a GSoC student.
Other Contributions


 Pull Request  
 Description  


#3866
fix the broken GMM


#3857
make BinaryLabels can save the values parameter


#3887
add error message when features are empty


#3821
use SGVector instead of plain pointer in GMM


#3876
fix the leaking DynamicArray unittest
Pull Request	Description
#3832	Replace DynArray with std::vector
#3833	Add unit test for DynamicArray
#3852	Add serializable test for DynamicArray
Pull Request	Description
#gist of stander random benchmark	a benchmark between different random engine in C++11
#3888	get rid of the global random
#3906	remove random functions in CMath
Pull Request	Description
#3866	fix the broken GMM
#3857	make BinaryLabels can save the values parameter
#3887	add error message when features are empty
#3821	use SGVector instead of plain pointer in GMM
#3876	fix the leaking DynamicArray unittest