Skip to content

Instantly share code, notes, and snippets.

@seenimohamed
Last active April 11, 2022 00:49
Show Gist options
  • Save seenimohamed/c9c6f71cd8237a2aa64c870baf15f90e to your computer and use it in GitHub Desktop.
Save seenimohamed/c9c6f71cd8237a2aa64c870baf15f90e to your computer and use it in GitHub Desktop.

Entropy vs Information gain

In decision tree algorithm, these 2 terms play major role and sometimes harder to grasp what it means. Lets decode.

Entropy - uncertainty/impurity

Let's assume we have 3 classes (namely A,B,C) in our dataset. Entropy (E) = (prob. of randomly selecting an example in class A) +(prob. of randomly selecting an example in class B) + (prob. of randomly selecting an example in class C). i.e, sum of all the probabilities. (total uncertainty of data)

What do we get out of this?

  • If E = 0, which means single class present, useless for training
  • If E = 1, data is evenly available.
  • If E > 1, data is uneven. (skewed)

It help us the to determine, whether data is skewed (high E) or evenly distributed (E=1). Basically, tells about the data spread.

Information gain

In decision tree, algorithm split the branch based on random condition. After split E will be calculated again which would be E.child. Information gain (IG) = E.parent - E.child

If the IG is high, our split is good. decision tree splits randomly and calculate information gain for each split in parallel, and take the split which has higher information gain.

The more the entropy removed, the greater the information gain. The higher the information gain, the better the split.

In a nutshell, we are trying reduce the entropy (uncertainty) of data as much as possible.

url : https://www.section.io/engineering-education/entropy-information-gain-machine-learning/

#decisiontrees #machinelearning #entropy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment