In decision tree algorithm, these 2 terms play major role and sometimes harder to grasp what it means. Lets decode.
Let's assume we have 3 classes (namely A,B,C) in our dataset. Entropy (E) = (prob. of randomly selecting an example in class A) +(prob. of randomly selecting an example in class B) + (prob. of randomly selecting an example in class C). i.e, sum of all the probabilities. (total uncertainty of data)
What do we get out of this?
- If E = 0, which means single class present, useless for training
- If E = 1, data is evenly available.
- If E > 1, data is uneven. (skewed)
It help us the to determine, whether data is skewed (high E) or evenly distributed (E=1). Basically, tells about the data spread.
In decision tree, algorithm split the branch based on random condition. After split E will be calculated again which would be E.child.
Information gain (IG) = E.parent - E.child
If the IG is high, our split is good. decision tree splits randomly and calculate information gain for each split in parallel, and take the split which has higher information gain.
The more the entropy removed, the greater the information gain. The higher the information gain, the better the split.
In a nutshell, we are trying reduce the entropy (uncertainty) of data as much as possible.
url : https://www.section.io/engineering-education/entropy-information-gain-machine-learning/
#decisiontrees #machinelearning #entropy