Skip to content

Instantly share code, notes, and snippets.

@MichaelAquilina
Created November 30, 2013 15:01
Show Gist options
  • Save MichaelAquilina/7720122 to your computer and use it in GitHub Desktop.
Save MichaelAquilina/7720122 to your computer and use it in GitHub Desktop.
//calculate tfidf value with given input parameters
private double TFIDF(int TermFrequency, int DocumentFrequency, int NormalizationValue, int NumberOfDocuments)
{
//it is important to specify that a divide is a double otherwise the compiler will assume its an integer since the paramaters are integers
return ((double)TermFrequency / NormalizationValue) * Math.Log(((double)NumberOfDocuments / DocumentFrequency), 2);
}
@MichaelAquilina
Copy link
Author

Term Frequency, Inverse Document Frequency Equation

@MichaelAquilina
Copy link
Author

Definitions from the equation above:

  • TermFrequency = frequency of a term in some document
  • NormalizationValue = usually the length of the document
  • NumberOfDocuments = number of documents in the corpus
  • DocumentFrequency = frequency of the term in the entire corpus

@MichaelAquilina
Copy link
Author

Should be easy to convert to Java since its C# syntax

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment