Google Summer of Code 2018 : Alternative Smart Executors
Here is my submission for my project Alternative Smart executors. The objective was to study the applications of machine learning on loop level-parallelism. In previous work [0] a classification method have proven to be successful at predicting the best policies for HPX for-each loops. My Google Summer of Code project goal was based on this paper and the work I did is divided into two sections:
-Improving the implementation of machine learning in hpxML
-Comparing Regressions algorithms to classifications algorithms to see which one would be better at predicting the best chunk size for a hpx for loop.
- Loop Convert
The loop convert ClangTool has two distinct purposes, extract and print features of hpx for_each loops and convert execution policy of a loop to a function which uses machine-learning to predict a given parameter (Prefetching distance or chunk size). These two task are different and require different instructions. In the old version of LoopConvert, the user would get the desired behavior by commenting and un-commenting sections of the code, which requires recompilation every time you want a certain behavior. The improvement was to add a flags -P and -C which can be used to get the desired behavior. Link to commit https://github.com/gablabc/hpxML/commit/899a143ee1d38694bd58e2923563216bb0425919
I also worked on a wiki page that would describe how to compile the Clangtool on the Rostam clusters and how to use it with the flags.
Link to wiki https://github.com/gablabc/hpxML/wiki/How-to-compile-ClangTool-and-feature-extraction
- Data generation
Another improvement to hpxML was to build a repository that would allows for automatic data generation. This repository was built from scratch with the goal of allowing anyone to generate data using their own hpx for_each() loops. Currently, the implementation is not as user friendly as it could be but it works nonetheless.
https://github.com/gablabc/hpxML/tree/submodules/Training_data/matrix_mult
here is a link to the wiki https://github.com/gablabc/hpxML/wiki/How-to-generate-Data
- Cross Validation and Multinomial_logistic_regression_model;
The multinomial logistic regression class has been modified to make cross validation easier. Cross-validation is a universal technique used in machine-learning to evaluate models. Before, the model was evaluate on the training data, which doesn't represent how well it can adapt to new scenarios. The class was also modified in certain ways to be more similar to functions in python scikit-learn's library.
Commits to the class
- First commit https://github.com/gablabc/hpxML/commit/b167f522429cc2ec69eb389532c16c7102a77330
- Second commit https://github.com/gablabc/hpxML/commit/b1c7422a92fe55760652d0db6dd24e66f3531e24
- Third commit (add bias to model) https://github.com/gablabc/hpxML/commit/7b1e045af8b8e01a7d4ba4e33a2a16b1637ada95
here is a link to the wiki page of the new regression class https://github.com/gablabc/hpxML/wiki/Multinomial_logistic_regression_model-Class
New Cross validation cpp file https://github.com/gablabc/hpxML/blob/submodules/logisticRegressionModel/algorithms/cross-validation/cross-validation.cpp
The other part of the project was doing research on machine learning applied to loop-level parallelism on hpx. My main hypothesis was that using regressions algorithms would minimize the execution time of hpx for-loops compared to classification algorithms. This document contains all the major results of my research. Note that not all results are machine-learning related. Some results are simply analysis of how the best choice of chunk size varies with new algorithms and new input sizes.
The following repository has been create to store any code and picture that is related to the research.
https://github.com/gablabc/Machine-Learning-Research-HPX
[0] Khatami, Z., Troska, L., Kaiser, H., Ramanujam, J., & Serio, A. (2017, November). HPX Smart Executors. In Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware (p. 3). ACM.