seenimohamed/Entropy Vs Information gain.md

## Entropy Vs Information gain.md

      
    Raw
  

              Entropy Vs Information gain.md
            
          
    Entropy vs Information gain

In decision tree algorithm, these 2 terms play major role and sometimes harder to grasp what it means. Lets decode.
Entropy - uncertainty/impurity

Let's assume we have 3 classes (namely A,B,C) in our dataset.
Entropy (E) = (prob. of randomly selecting an example in class A) +(prob. of randomly selecting an example in class B) + (prob. of randomly selecting an example in class C).
i.e, sum of all the probabilities. (total uncertainty of data)
What do we get out of this?

If E = 0, which means single class present, useless for training
If E = 1, data is evenly available.
If E > 1, data is uneven. (skewed)

It help us the to determine, whether data is skewed (high E) or evenly distributed (E=1). Basically, tells about the data spread.
Information gain

In decision tree, algorithm split the branch based on random condition. After split E will be calculated again which would be E.child.
Information gain (IG) = E.parent - E.child
If the IG is high, our split is good. decision tree splits randomly and calculate information gain for each split in parallel, and take the split which has higher information gain.
The more the entropy removed, the greater the information gain. The higher the information gain, the better the split.
In a nutshell, we are trying reduce the entropy (uncertainty) of data as much as possible.
url : https://www.section.io/engineering-education/entropy-information-gain-machine-learning/
#decisiontrees #machinelearning #entropy