Skip to content

Instantly share code, notes, and snippets.

Last active September 26, 2016 12:48
Show Gist options
  • Save taku-y/66c9613ab29a150e4493b899a6507354 to your computer and use it in GitHub Desktop.
Save taku-y/66c9613ab29a150e4493b899a6507354 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Copy link

kayhan-batmanghelich commented Jul 9, 2016

@taku-y Thanks for sharing. I have not tried your code but reading it I think there is bug here:

    def encode(self, xs):
        if 0 < self.p_corruption:
            dixs, vixs = xs.nonzero()
            mask = tt.set_subtensor(
                tt.zeros_like(xs)[dixs, vixs], 
                self.rng.binomial(size=dixs.shape, n=1, p=1-self.p_corruption)
            xs_ = xs * mask
            xs_ = xs

        w0 = self.w0.reshape((self.n_words, self.n_hidden))
        w1 = self.w1.reshape((self.n_hidden, 2 * (self.n_topics - 1)))
        hs = tt.tanh( + self.b0)
        zs = + self.b1
        zs_mean = zs[:, :(self.n_topics - 1)]
        zs_std = zs[:, (self.n_topics - 1):]
        return zs_mean, zs_std

don't you need to enforce non-negativity for zs_std ? I might be wrong or have not carefully read the rest of the code.

Copy link

taku-y commented Jul 12, 2016

Thanks for your comment. zs_std is actually log transformed std and tt.exp(zs_std) is used as std in advi_minibatch(), the inference program. I will add a comment to the notebook about this.

Copy link

akashgit commented Jul 14, 2016

@taku-y Thanks for sharing this. I've been working on this problem for a while myself, so it is a relief to see it work.
but when i tried the code I couldn't replicate the result. I had to make the following edits in order to make it work, which might be causing the problem:

in block 8 above, minibatches = [create_minibatch(docs_tr.toarray().astype('float32'))] had to remove the braces otherwise _check_minibatches(minibatch_tensors, minibatches) was raising
_value_error(isinstance(minibatches, types.GeneratorType), 'minibatches must be a generator.')

had to change e = f(*next(minibatches)) to e = f(next(minibatches)) in line 450 in, otherwise (after 1.) too many arguments was being raised


most importantly, had to change the learning rate. i tried values in the range [1e-2,1e-4] since keeping it to 5e-2 was resulting in elbo throwing nan after ~40 iterations. for example , the plot is for 1e-3 for 4000 iterations. when i run with 1e-2 and sometimes when it finishes without nans the result is slightly better but not even close to yours. for me 4e-3 with roughly 5000 iterations works the best but again not as good as in your case. seems like its too sensitive to the optimiser and the optimisation with regard to the learning rate (what algo is being used behind the scene, is it adam? )

As such my runs never reach the elbo values that are reported here. and consequently my topics are way worse:
`Topic #0: like just don make does know people new think did

Topic #1: people like just does new use good time know think

Topic #2: like just people know does don good new use think

Topic #3: people know just like don does good time use god

Topic #4: people know time like use don does just good god

Topic #5: just like don time think new say know people good

Topic #6: don just time know good think use people like make

Topic #7: just people know edu use like good don does think

Topic #8: people new does time just don good use like know

Topic #9: like don just new good people know does time use`

is there anything that pops out to you about my changes that might be causing this behaviour? in particular, it seems like most of the topics are quite similar in my case.

Copy link

taku-y commented Jul 26, 2016

@akashgit Sorry for late to reply. I had been working on incorporation of autoencoding VB into the PyMC3 repo.

For 1. and 2. of your comments, I fixed the bug on the notebook. Your comments were helpful for the fix. Thanks.

  1. The sensitivity might be due to the LDA model and/or optimization algo (adagrad with the gradients of the latest 10 updates), although it's not clear to me. I also encountered the same problem that ELBO was nan with learning rate of 5e-2. So in the notebook I set the learning rate to 2e-2 and n=3000. Try to run the updated notebook with the latest version of PyMC3. I will continue to examine this sensitivity issue.

Copy link

akashgit commented Aug 14, 2016

hi @taku-y, I tried using this notebook after updating to the latest pymc3 version (instead of running it from your fork) and I keep getting test_compute_value related error on line doc = pm.DensityDist('doc', logp_lda_doc(beta, theta), observed=doc_t) about failure to downcast from float64 to 32 (my global floatX is set to 32)
I've double checked my .theanorc and even tried manually setting the flag to ignore/off but cannot seem to resolve it. any suggestions? just to clarify, this problem does not occur when using your old fork from my last comment with the edits but I am still struggling with setting the learning rate and had not been able to reproduce the topics of similar quality to yours.

Copy link

taku-y commented Aug 18, 2016

Hi, thanks for your report. What version of theano and numpy are you using? I use theano 0.8.2 and numpy 1.10.4. Updating these libraries to latest versions might resolve the problem. Actually, I'm using a Docker container. If you want, I prepare a Docker file to build a container image on which the notebook works.

Copy link

akashgit commented Aug 18, 2016

Hi @taku-y , I was using theano 0.8.2 and 1.10.4 before updating to latest pymc3 (when things were working). After upgrading, while debugging through the error i ended up on 0.9.0.dev2 and numpy 1.11.1 (though it wasn't helpful).

i found these two related issues mentioned on pymc3 repo
though not sure if they are resolved.

The interesting thing is that these problem only occur when i update to the latest pymc3 version. Running them on your old fork (from the time of my first comment with the suggested edits) doesn't cause any of those errors.

i haven't used docker before, but m willing to try it so it would be nice if you could send me the image. Thanks again.

Copy link

taku-y commented Aug 23, 2016

I could not reproduce the test_compute_value error with the latest master branch of PyMC3. Could you show the error message? We might find something wrong.

Copy link

akashgit commented Aug 26, 2016

Hi @taku-y, Here is the actual error. Somehow magically float64 are appearing even though my config file and flags are appropriately set.

Applied stickbreaking-transform to theta and added transformed theta_stickbreaking_ to model.

Applied stickbreaking-transform to beta and added transformed beta_stickbreaking_ to model.

TypeError Traceback (most recent call last)
in ()
10 beta = Dirichlet('beta', a=(1.0 / n_topics) * np.ones((n_topics, n_words)).astype('float32'),
11 shape=(n_topics, n_words), transform=t_stick_breaking(1e-9))
---> 12 doc = pm.DensityDist('doc', logp_lda_doc(beta, theta), observed=doc_t)

/IPC_MAP/lib/python2.7/site-packages/pymc3-3.0-py2.7.egg/pymc3/distributions/distribution.pyc in new(cls, name, _args, *_kwargs)
24 data = kwargs.pop('observed', None)
25 dist = cls.dist(_args, *_kwargs)
---> 26 return model.Var(name, dist, data)
27 elif name is None:
28 return # for pickle

/IPC_MAP/lib/python2.7/site-packages/pymc3-3.0-py2.7.egg/pymc3/model.pyc in Var(self, name, dist, data)
304 self.named_vars[] = v
305 else:
--> 306 var = ObservedRV(name=name, data=data, distribution=dist, model=self)
307 self.observed_RVs.append(var)
308 if var.missing_values:

/IPC_MAP/lib/python2.7/site-packages/pymc3-3.0-py2.7.egg/pymc3/model.pyc in init(self, type, owner, index, name, data, distribution, model)
579 self.missing_values = data.missing_values
--> 581 self.logp_elemwiset = distribution.logp(data)
582 self.model = model
583 self.distribution = distribution

in ll_docs_f(docs)
17 vfreqs = docs[dixs, vixs]
18 ll_docs = vfreqs * pm.math.logsumexp(
---> 19 tt.log(theta[dixs].astype('float32')) + 20 tt.log(beta.T[vixs].astype('float32')), axis=1).ravel().astype('float32')

/IPC_MAP/lib/python2.7/site-packages/theano/tensor/var.pyc in getitem(self, args)
502 TensorVariable, TensorConstant,
503 theano.tensor.sharedvar.TensorSharedVariable))):
--> 504 return self.take(args[axis], axis)
505 else:
506 return theano.tensor.subtensor.advanced_subtensor(self, *args)

/IPC_MAP/lib/python2.7/site-packages/theano/tensor/var.pyc in take(self, indices, axis, mode)
547 def take(self, indices, axis=None, mode='raise'):
--> 548 return theano.tensor.subtensor.take(self, indices, axis, mode)

/IPC_MAP/lib/python2.7/site-packages/theano/tensor/subtensor.pyc in take(a, indices, axis, mode)
2369 return advanced_subtensor1(a.flatten(), indices)
2370 elif axis == 0:
-> 2371 return advanced_subtensor1(a, indices)
2372 else:
2373 if axis < 0:

/IPC_MAP/lib/python2.7/site-packages/theano/gof/op.pyc in call(self, _inputs, *_kwargs)
610 for i, ins in enumerate(node.inputs):
611 try:
--> 612 storage_map[ins] = [self._get_test_value(ins)]
613 compute_map[ins] = [True]
614 except AttributeError:

/IPC_MAP/lib/python2.7/site-packages/theano/gof/op.pyc in _get_test_value(cls, v)
547 # ensure that the test value is correct
548 try:
--> 549 ret = v.type.filter(v.tag.test_value)
550 except Exception as e:
551 # Better error message.

/IPC_MAP/lib/python2.7/site-packages/theano/tensor/type.pyc in filter(self, data, strict, allow_downcast)
138 '"function".'
139 % (self, data.dtype, self.dtype))
--> 140 raise TypeError(err_msg, data)
141 elif (allow_downcast is None and
142 type(data) is float and

TypeError: For compute_test_value, one input test value does not have the requested type.

The error when converting the test value to that variable type:

TensorType(float32, matrix) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function".
[[ 0.09999998 0.1 0.1 ..., 0.1 0.10000001 0.10000001] [ 0.09999998 0.1 0.1 ..., 0.1 0.10000001 0.10000001] [ 0.09999998 0.1 0.1 ..., 0.1 0.10000001 0.10000001] ..., [ 0.09999998 0.1 0.1 ..., 0.1 0.10000001 0.10000001] [ 0.09999998 0.1 0.1 ..., 0.1 0.10000001 0.10000001] [ 0.09999998 0.1 0.1 ..., 0.1 0.10000001 0.10000001]]

Copy link

taku-y commented Aug 29, 2016

Hi @akashgit, thanks for sending me detailed information. But I could not reproduce your result yet. I tested the notebook on theano-0.8.2 and 0.9.0dev2, python2 and 3. Under all of these environments, ADVI worked.

Here is my Dockerfile on which the notebook ran.

From jupyter/datascience-notebook
MAINTAINER Taku Yoshioka <>

ENV TERM xterm-color

USER jovyan
RUN pip install theano joblib

Save the above as Dockerfile, then you can build an image by typing

docker build -t pymc3-test .

pymc3-test is the name of the created image. Finally, start running a container with the above image

docker run -d -v /your/home:/home/jovyan/work -p 8889:8888 --name pymc3 pymc3-test

You can access the notebook server via localhost:8889. When using VirtualBox (DockerMachine), you need to specify the IP address on which DockerMachine is running. See I recommend Linux or Mac, both of which have native support of Docker so you don't need to specify the IP address of the DockerMachine.

Here is the option of the command docker run:

  • -d: Background execution
  • -v: Mount volume /your/home on your host to /home/jovyan/work on the container.
  • -p: Port forwarding
  • --name: Image of the container

Copy link

thanks for the file @taku-y. I will try to see what is wrong with my current configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment