Skip to content

Instantly share code, notes, and snippets.

@danstowell
Last active May 18, 2020 14:15
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save danstowell/151a3b68ceb9c5f4a51f to your computer and use it in GitHub Desktop.
Save danstowell/151a3b68ceb9c5f4a51f to your computer and use it in GitHub Desktop.
Implementations of non-negative matrix factorisation (NMF) in Stan
N <- 15
M <- 12
K <- 3
Wconc <- 100
Hconc <- 5
Winit <- structure(c(.056,.111,.056,.111,.056,.111,.056,.111,.056,.111,.056,.111,.111,.056,.111,.056,.111,.056,.111,.056,.111,.056,.111,.056,.063,.063,.125,.063,.063,.125,.063,.063,.125,.063,.063,.125
), .Dim=c(12,3))
X <- structure(c(.032,.032,.091,.032,.157,.264,.157,.139,.486,.257,.709,.934,.036,.036,.08,.036,.255,.192,.255,.1,.638,.171,1.228,.62,.039,.039,.226,.039,.119,.521,.119,.128,.809,.227,.475,1.537,.047,.047,.212,.047,.113,.589,.113,.242,.717,.458,.407,2.002,.02,.02,.058,.02,.07,.189,.07,.107,.251,.203,.293,.705,.033,.033,.062,.033,.128,.268,.128,.208,.325,.403,.549,1.142,.039,.039,.245,.039,.144,.516,.144,.081,.923,.128,.609,1.39,.05,.05,.189,.05,.326,.383,.326,.091,1.077,.136,1.55,1.024,.039,.039,.245,.039,.106,.548,.106,.112,.847,.194,.407,1.565,.023,.023,.134,.023,.072,.301,.072,.065,.486,.112,.291,.862,.04,.04,.145,.04,.205,.352,.205,.131,.726,.233,.938,1.107,.028,.028,.173,.028,.076,.397,.076,.093,.593,.164,.287,1.163,.028,.028,.058,.028,.14,.199,.14,.136,.36,.257,.636,.781,.033,.033,.081,.033,.108,.309,.108,.207,.347,.401,.441,1.245,.036,.036,.066,.036,.142,.285,.142,.221,.358,.428,.615,1.211
), .Dim=c(12,15))
data {
int<lower=1> N; // num frames
int<lower=1> M; // num bins
int<lower=1> K; // num factors
real<lower=0> Wconc; // concentration parameter on template priors - a large number means relative certainty, stick close to the prior
real<lower=0> Hconc; // concentration parameter on activations - a large number encourages sparsity
matrix<lower=0>[M,K] Winit; // prior spectral templates -- these should each be different (or else it'd be unidentifiable). avoid zeroes too.
matrix<lower=0>[M,N] X; // spectrogram
}
parameters {
simplex[M] W[K]; // inferred spectral templates
matrix<lower=0>[K,N] H; // activations
}
transformed parameters {
vector<lower=0>[M*N] Xnorm;
vector<lower=0>[M] Winitnorm[K];
Xnorm <- to_vector(X); // spectrogram, flattened, but in this case NOT normalised (since not PLCA)
for (k in 1:K) {
Winitnorm[k] <- col(Winit, k) * Wconc / sum(col(Winit, k)); // Winit, normalised and pre-multiplied by concn
}
}
model {
vector[M*N] Xest;
matrix[K,M] Wmat; // W, but indexed backwards and transposed later, since it seems that's the only way to assign the simplices into the matrix columns
for (k in 1:K) {
Wmat[k] <- W[k]';
}
Xest <- to_vector(Wmat' * H);
for (k in 1:K) {
W[k] ~ dirichlet(Winitnorm[k]); // our supplied spectral templates are used as centres for the priors on the spectral templates. each tpl has its own dirichlet.
}
to_vector(H) ~ exponential(Hconc); // a prior on the activations
Xnorm ~ normal(0, 0.5 * Xest); // IS-NMF -- the factor of 1/2 here comes from X being assumed a power-spectrogram where the underlying random variables are complex unit-norm
}
data {
int<lower=1> N; // num frames
int<lower=1> M; // num bins
int<lower=1> K; // num factors
real<lower=0> Wconc; // concentration parameter on template priors - a large number means relative certainty, stick close to the prior
real<lower=0> Hconc; // concentration parameter on activations - a large number encourages sparsity
matrix<lower=0>[M,K] Winit; // prior spectral templates -- these should each be different (or else it'd be unidentifiable). avoid zeroes too.
matrix<lower=0>[M,N] X; // spectrogram
}
parameters {
simplex[M] W[K]; // inferred spectral templates
matrix<lower=0>[K,N] H; // activations
real<lower=0> Xconc; // overall concentration of spectrogram, which means how closely it sticks to the dotproduct, i.e. relates to noise level
}
transformed parameters {
vector<lower=0>[M*N] Xnorm;
vector<lower=0>[M] Winitnorm[K];
Xnorm <- to_vector(X / sum(X)); // spectrogram, normalised and flattened
for (k in 1:K) {
Winitnorm[k] <- col(Winit, k) * Wconc / sum(col(Winit, k)); // Winit, normalised and pre-multiplied by concn
}
}
model {
vector[M*N] Xest;
matrix[K,M] Wmat; // W, but indexed backwards and transposed later, since it seems that's the only way to assign the simplices into the matrix columns
for (k in 1:K) {
Wmat[k] <- W[k]';
}
Xest <- to_vector(Wmat' * H);
for (k in 1:K) {
W[k] ~ dirichlet(Winitnorm[k]); // our supplied spectral templates are used as centres for the priors on the spectral templates. each tpl has its own dirichlet.
}
to_vector(H) ~ exponential(Hconc); // a prior on the activations
Xconc ~ lognormal(100, 10); // a high value of Xconc means low SNR (a more semantic parametrisation would be nice...)
Xnorm ~ dirichlet(Xest * Xconc); // our input data is modelled as a multinomial sampled from this overall mega-dirichlet
}
@Roopali24
Copy link

Hi!

I am trying to implement NMF in rstan and this code helps a lot. I just have a few questions, it would be great if you could help.

So, I understand you're trying to factorize X =WH where columns of W are dirichlet and H is exponential.

What is 'transformed parameters' block for here?

@danstowell
Copy link
Author

Hi. I'm afraid I don't know - this code is old and I'm not active in the same topics right now.

The transformed parameters: well, the W are dirichlet, as you say. Rather than passing the Dirichlet prior in directly as parameters, I allow the user to pass in a set of spectral templates and a concentration parameter, then we transform these into "Winitnorm" as a probabilistically meaningful Dirichlet prior.

It may also be useful to pre-transform some things (e.g. "Xnorm"), and it's more efficient to do it in this preprocessing block than inside the main model that Stan optimises.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment