Skip to content

Instantly share code, notes, and snippets.

@arunreddy
Created December 22, 2015 18:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save arunreddy/978f4f34321bcda25264 to your computer and use it in GitHub Desktop.
Save arunreddy/978f4f34321bcda25264 to your computer and use it in GitHub Desktop.
Synthetic Data using LDA Generative Model
using PyPlot;
using Distributions;
TOPIC_N = 5;
VOCABULARY_SIZE = 1000;
DOC_NUM = 100;
TERM_PER_DOC = 200;
X = zeros(DOC_NUM,VOCABULARY_SIZE);
phi=[];
for i=1:TOPIC_N
push!(phi,rand(Dirichlet(VOCABULARY_SIZE,0.01)));
end
for i=1:DOC_NUM
theta=rand(Dirichlet(TOPIC_N,0.8));
for j=1:TERM_PER_DOC
z = rand(Multinomial(1,theta));
z_assignment = 1;
for k=1:TOPIC_N
if(z[k]==1)
break;
end
z_assignment+=1;
end
w = rand(Multinomial(1,phi[z_assignment]))
w_assignment=1;
for k=1:VOCABULARY_SIZE
if(w[k]==1)
break;
end
w_assignment+=1;
end
X[i,w_assignment]+=1;
end
end
matshow(X)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment