Decision trees used in data mining are of two main types[1]:
- Decision Tree Classifier predicts discrete label outcomes associated with data.
- Decision Tree Regressor predicts real/continuous outcome numbers (e.g. the price of a house, or a patient's length of stay in a hospital).
The regression problem addresses predicting real values from a data point with many attributes. Hence, the metric Information Gain[2] measuring based on discrete-label distributions may not be suitable for continuous range of outcome value. Alternatively, one of the best metric for regression application is Standard Deviation Reduction[3]. Intuitively, Information Gain metric favors spliting attribute resulting lower entropy outcome distribution. Standard Deviation Reduction metric also favors spliting attribute resulting lower outcome value variance, which has the same effect decreasing the uncertainty in outcome prediction as Information Gain metric.
Factors
For each spliting attribute value, these factors are calculated for the outcome:
- Average outcome:
- Outcome standard deviation:
- Coefficient of Deviation (CV) is used to decide when to stop branching (e.g typically < 10%)
- Standard deviation for outcome and predictor:
- Finally, the standard deviation reduction (SDR) metric votes for the best attribute to split:
Procedures
For each node(split), these steps are executed:
- Step 1: Calculate SDR for each candidate spliting attribute of the current node:
- Step 2: Choose the best SDR attribute.
- Step 3: For each value of the choosen attribute, calculate the average outcome, the outcome standard deviation and CV value.
- Step 4: On each value of the choosen attribute, decide to stop branching or not on the choosen attribute based on CV value or number of data point. If no stop, branch the tree based on that attribute values. If stop, choose the calculated average outcome as the output for that leaf node.
For an example on predicting hours played based on weather using Decision Tree, please visit [3].
In classification problem, kNN chooses in the top k-candidates the majority class using uniform distance or largest weighted class using weighted distance. For applying in regression problem, one of the simple modification is that kNN can output the distant-weighted sum of the k-similar training target values.
where {xi} is the set of k similar data points to the new sample x* and {yi} are the corresponding target values.
[1] Wikipedia contributors. (2020, May 18). Decision tree learning. In Wikipedia, The Free Encyclopedia. Retrieved 11:02, May 26, 2020, from https://en.wikipedia.org/w/index.php?title=Decision_tree_learning&oldid=957324010 [2] Prof. Dr. Steffen Staab. (2020, May 25). Machine Learning - 5 Decision Trees. In Machine Learning course, IPVS, University of Stuttgart. [3] Saed Sayad. Decision Tree - Regression. Retrieved 12:01, May 26, 2020, from: https://saedsayad.com/decision_tree_reg.htm