Skip to content

Instantly share code, notes, and snippets.

@rakasaka
Created August 24, 2011 21:45
Show Gist options
  • Star 10 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save rakasaka/1169341 to your computer and use it in GitHub Desktop.
Save rakasaka/1169341 to your computer and use it in GitHub Desktop.
Unsupervised topic modeling in Ruby using LDA
require 'lda-ruby'
corpus = Lda::Corpus.new
corpus.add_document(Lda::TextDocument.new(corpus, "a lion is a wild feline animal", []))
corpus.add_document(Lda::TextDocument.new(corpus, "a dog is a friendly animal", []))
corpus.add_document(Lda::TextDocument.new(corpus, "a cat is a feline animal", []))
lda = Lda::Lda.new(corpus)
lda.verbose = false
lda.num_topics = (2)
lda.em('random')
topics = lda.top_words(3)
# Results
# => {0=>["animal", "friendly", "dog"], 1=>["animal", "feline", "cat"]}
@rakasaka
Copy link
Author

LDA is short for Latent Dirichlet Allocation, an algorithm developed by David Blei, at Princeton. It is an unsupervised mechanism for surfacing topics and themes within a group of text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment