Last active
August 21, 2016 17:48
-
-
Save jeresuikkila/7e59b57e30393c14403e84c597cf44f7 to your computer and use it in GitHub Desktop.
Splitting an array to training and test sets for machine learning
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def train_test_split(total_set, test_set_size = 0.25) | |
if test_set_size > 1.0 | |
test_set_size = 1.0 | |
elsif test_set_size < 0 | |
test_set_size = 0.0 | |
end | |
test_set_count = (total_set.length * test_set_size).floor | |
if test_set_count == 0 | |
raise StandardError, "Test size resulted in a test set of 0. Increase the test size." | |
elsif test_set_count == total_set.length | |
raise StandardError, "Test size resulted in a training set of 0. Decrease the test size." | |
end | |
total_set.shuffle! | |
test_set = total_set[0..test_set_count] | |
training_set = total_set[test_set_count+1..total_set.length] | |
return training_set, test_set | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment