primaryobjects/informationGain.R

## informationGain.R
entropy <- function(q) {
  # Calculate the entropy for a value.
  -1 * (q * log2(q) + (1 - q) * log2(1 - q))
}

positiveRatio <- function(data) {
  # Calculate the ratio of positives by the total measurements.
  sum(data$positives) / (sum(data$positives) + sum(data$negatives))
}

gain <- function(data, precision=3) {
  # Calculate the information gain for an attribute.
  # First, calculate the total entropy for this attribute by using its positive ratio.
  systemEntropy <- round(entropy(positiveRatio(data)), precision)

  # Calculate the total number of measurements.
  totalItems <- sum(data$positives) + sum(data$negatives)

  # Sum the entropy for each attribute value (Example: Outlook -> [sunny, overcast, rain])
  gains <- sum(sapply(1:length(data$positives), function(i) {
    # Calculate the total number of measurements for this attribute value.
    itemTotal <- data[['positives']][i] + data[['negatives']][i]

    # Calculate the ratio for this attribute value by all measurements.
    itemRatio <- itemTotal / totalItems

    # Calculate the entropy for this attribute value.
    outcomeEntropy <- entropy(data[['positives']][i] / itemTotal)

    # Cast NaN to 0 and return the result.
    result <- itemRatio * outcomeEntropy
    round(ifelse(is.nan(result), 0, result), precision)
  }))

  # The information gain is the remainder from the attribute entropy minus the attribute value gains.
  systemEntropy - gains
}

outlook <- c(list(positives=c(2, 4, 3), negatives=c(3, 0, 2)))
temperature <- c(list(positives=c(2, 4, 3), negatives=c(2, 2, 1)))
humidity <- c(list(positives=c(3, 6), negatives=c(4, 1)))
wind <- c(list(positives=c(6, 3), negatives=c(2, 3)))

print(gain(outlook))
print(gain(temperature))
print(gain(humidity))
print(gain(wind))

## output.md

      
    Raw
  

              output.md
            
          
    What is it?

This exercise comes from the online graduate course, "Artificial Intelligence", by Georgia Tech through Udacity. CS 6601
Lesson 7: Machine Learning
Lecture 33: Decision Trees
Why use Information Gain?

Information gain is calculated as the remainder from the difference between an attribute's entropy and the total system entropy.
Information gain is used when building decision trees, as it allows us to know which attribute has the most information gain and has the highest quality decision capability in the tree. In this manner, the attribute with the most information gain should be placed first in a decision tree, followed by attributes with lesser information gain. This results in a more compact and optimal decision tree.
Output

gain(outlook) = 0.246
gain(temperature) = 0.028
gain(humidity) = 0.151
gain(wind) = 0.047

Result

The selected attribute to add to the decision tree first, is the one with the largest information gain of 0.246, Outlook.
	entropy <- function(q) {
	# Calculate the entropy for a value.
	-1 * (q * log2(q) + (1 - q) * log2(1 - q))
	}

	positiveRatio <- function(data) {
	# Calculate the ratio of positives by the total measurements.
	sum(data$positives) / (sum(data$positives) + sum(data$negatives))
	}

	gain <- function(data, precision=3) {
	# Calculate the information gain for an attribute.
	# First, calculate the total entropy for this attribute by using its positive ratio.
	systemEntropy <- round(entropy(positiveRatio(data)), precision)

	# Calculate the total number of measurements.
	totalItems <- sum(data$positives) + sum(data$negatives)

	# Sum the entropy for each attribute value (Example: Outlook -> [sunny, overcast, rain])
	gains <- sum(sapply(1:length(data$positives), function(i) {
	# Calculate the total number of measurements for this attribute value.
	itemTotal <- data[['positives']][i] + data[['negatives']][i]

	# Calculate the ratio for this attribute value by all measurements.
	itemRatio <- itemTotal / totalItems

	# Calculate the entropy for this attribute value.
	outcomeEntropy <- entropy(data[['positives']][i] / itemTotal)

	# Cast NaN to 0 and return the result.
	result <- itemRatio * outcomeEntropy
	round(ifelse(is.nan(result), 0, result), precision)
	}))

	# The information gain is the remainder from the attribute entropy minus the attribute value gains.
	systemEntropy - gains
	}

	outlook <- c(list(positives=c(2, 4, 3), negatives=c(3, 0, 2)))
	temperature <- c(list(positives=c(2, 4, 3), negatives=c(2, 2, 1)))
	humidity <- c(list(positives=c(3, 6), negatives=c(4, 1)))
	wind <- c(list(positives=c(6, 3), negatives=c(2, 3)))

	print(gain(outlook))
	print(gain(temperature))
	print(gain(humidity))
	print(gain(wind))