This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# This code generates the simulated hyperplane dataset used in many stream learning papers. In addition to outputting the dataset itself it also outputs the dimension weights over time. I used these dimension weights as a measure of true feature importance in a paper I discuss here: http://www.ccri.com/2014/10/30/calculating-feature-importance-in-data-streams-with-concept-drift-using-online-random-forest/ | |
drift = function(start, numberToGenerate, magnitudeOfChange, probDirectionChange) { | |
directions = rep(magnitudeOfChange, numberToGenerate - 1) | |
for (i in 2:(numberToGenerate-1)) { | |
if(!(runif(1) >= probDirectionChange)) { | |
directions[i] = (directions[i-1] * -1) | |
} else { | |
directions[i] = (directions[i-1]) | |
} |