This exercise comes from the online graduate course, "Artificial Intelligence", by Georgia Tech through Udacity. CS 6601
Lesson 7: Machine Learning
Lecture 33: Decision Trees
Information gain is calculated as the remainder from the difference between an attribute's entropy and the total system entropy.
Information gain is used when building decision trees, as it allows us to know which attribute has the most information gain and has the highest quality decision capability in the tree. In this manner, the attribute with the most information gain should be placed first in a decision tree, followed by attributes with lesser information gain. This results in a more compact and optimal decision tree.
gain(outlook) = 0.246
gain(temperature) = 0.028
gain(humidity) = 0.151
gain(wind) = 0.047
The selected attribute to add to the decision tree first, is the one with the largest information gain of 0.246
, Outlook.