Skip to content

Instantly share code, notes, and snippets.

@henrygarner
Created February 2, 2011 11:34
Show Gist options
  • Save henrygarner/807571 to your computer and use it in GitHub Desktop.
Save henrygarner/807571 to your computer and use it in GitHub Desktop.
TFIDF measure in ruby based on the DefaultSimilarity implementation in Apache Lucene, http://lucene.apache.org/
def similarity(term_frequency, document_frequency, documents_count)
idf = Math.log(documents_count / (document_frequency + 1.0)) + 1
Math.sqrt(term_frequency) * idf
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment