Skip to content

Instantly share code, notes, and snippets.

View vishalmehta1991's full-sized avatar
🎮

Vishal Mehta vishalmehta1991

🎮
View GitHub Profile
@vishalmehta1991
vishalmehta1991 / fruit_rf_training.csv
Last active October 30, 2019 17:07
Fruit RF Training data
Instance Red Green Blue Size (cm) Fruit (Label)
0 1.0 0.0 0.0 7.0 Apple
1 0.0 1.0 0.0 20 Water Melon
2 1.0 0.0 0.0 1.0 Cherry
3 0.0 1.0 0.0 7.5 Apple
4 1.0 0.0 0.0 1.0 Strawberry
5 1.0 0.0 0.0 0.8 Cherry
@vishalmehta1991
vishalmehta1991 / fruit_subsampled_dataset_1.csv
Created October 30, 2019 17:10
Fruit subsampled dataset 1
Instance Red Green Blue Size (cm) Fruit (Label)
5 1.0 0.0 0.0 0.8 Cherry
0 1.0 0.0 0.0 7.0 Apple
0 1.0 0.0 0.0 7.0 Apple
4 1.0 0.0 0.0 1.0 Strawberry
@vishalmehta1991
vishalmehta1991 / fruit_subsampled_dataset_2.csv
Created October 30, 2019 17:11
Fruit subsampled dataset 2
Instance Red Green Blue Size (cm) Fruit (Label)
4 1.0 0.0 0.0 1.0 Strawberry
4 1.0 0.0 0.0 1.0 Strawberry
1 0.0 1.0 0.0 20 Water Melon
3 0.0 1.0 0.0 7.5 Apple
@vishalmehta1991
vishalmehta1991 / fruit_subsampled_dataset_3.csv
Created October 30, 2019 17:12
Fruit subsampled dataset 3
Instance Red Green Blue Size (cm) Fruit (Label)
1 0.0 1.0 0.0 20 Water Melon
0 1.0 0.0 0.0 7.0 Apple
5 1.0 0.0 0.0 0.8 Cherry
2 1.0 0.0 0.0 1.0 Cherry
@vishalmehta1991
vishalmehta1991 / pseudo_code
Created October 30, 2019 19:59
Pseudo tree builder code
(A) Initialize a bit mask indicating which samples are contained in each node
(B) Initialize a “node map” indicating which nodes are present at each level
(C) ForEach(tree_level)
1. Find the node id of all data samples, using the bit mask
2. Compute the possible splits for all bins, all columns and all nodes
3. Find the best split for each node
4. Update the bit mask and sparse node map to feed the next level
@vishalmehta1991
vishalmehta1991 / sidebyside.markdown
Last active October 31, 2019 09:26
Side by side cuML vs Sklearn
|  ######cuML######                 |  ######Sklearn######                 |
|                                   |                                      |
|  from cuml import                 |  from sklearn.ensemble import        |
|  RandomForestClassifier as cuRF   |  RandomForestClassifier as sklRF     |
|                                   |  import multiprocessing as mp        |
|                                   |                                      |
|  # cuml Random Forest params      |  #sklearn Random Forest params       |
|  cu_rf_params = {                 |  skl_rf_params = {                   |
|     ‘n_estimators’: 25,           |      ‘n_estimators’: 25,             |
@vishalmehta1991
vishalmehta1991 / mg_rf_dask.py
Last active September 21, 2021 21:30
Multi GPU RF using DASK
from cuml.dask.ensemble import RandomForestClassifier as cuRF_mg
# cuml Random Forest params
cu_rf_params = {
‘n_estimators’: 25,
‘max_depth’: 13,
‘n_bins’: 15,
‘n_streams’: 8
}
# Start by setting up the CUDA cluster on the local host