-
-
Save taku-y/66c9613ab29a150e4493b899a6507354 to your computer and use it in GitHub Desktop.
Thanks for your comment. zs_std
is actually log transformed std and tt.exp(zs_std)
is used as std in advi_minibatch()
, the inference program. I will add a comment to the notebook about this.
@taku-y Thanks for sharing this. I've been working on this problem for a while myself, so it is a relief to see it work.
but when i tried the code I couldn't replicate the result. I had to make the following edits in order to make it work, which might be causing the problem:
in block 8
above, minibatches = [create_minibatch(docs_tr.toarray().astype('float32'))]
had to remove the braces otherwise _check_minibatches(minibatch_tensors, minibatches)
was raising
_value_error(isinstance(minibatches, types.GeneratorType), 'minibatches must be a generator.')
had to change e = f(*next(minibatches))
to e = f(next(minibatches))
in line 450
in advi_minibatch.py
, otherwise (after 1.) too many arguments
was being raised
most importantly, had to change the learning rate. i tried values in the range [1e-2,1e-4]
since keeping it to 5e-2
was resulting in elbo
throwing nan
after ~40
iterations. for example , the plot is for 1e-3
for 4000
iterations. when i run with 1e-2
and sometimes when it finishes without nans
the result is slightly better but not even close to yours. for me 4e-3
with roughly 5000
iterations works the best but again not as good as in your case. seems like its too sensitive to the optimiser and the optimisation with regard to the learning rate (what algo is being used behind the scene, is it adam? )
As such my runs never reach the elbo
values that are reported here. and consequently my topics are way worse:
`Topic #0: like just don make does know people new think did
Topic #1: people like just does new use good time know think
Topic #2: like just people know does don good new use think
Topic #3: people know just like don does good time use god
Topic #4: people know time like use don does just good god
Topic #5: just like don time think new say know people good
Topic #6: don just time know good think use people like make
Topic #7: just people know edu use like good don does think
Topic #8: people new does time just don good use like know
Topic #9: like don just new good people know does time use`
is there anything that pops out to you about my changes that might be causing this behaviour? in particular, it seems like most of the topics are quite similar in my case.
@akashgit Sorry for late to reply. I had been working on incorporation of autoencoding VB into the PyMC3 repo.
For 1. and 2. of your comments, I fixed the bug on the notebook. Your comments were helpful for the fix. Thanks.
- The sensitivity might be due to the LDA model and/or optimization algo (adagrad with the gradients of the latest 10 updates), although it's not clear to me. I also encountered the same problem that ELBO was nan with learning rate of
5e-2
. So in the notebook I set the learning rate to2e-2
andn=3000
. Try to run the updated notebook with the latest version of PyMC3. I will continue to examine this sensitivity issue.
hi @taku-y, I tried using this notebook after updating to the latest pymc3 version (instead of running it from your fork) and I keep getting test_compute_value
related error on line doc = pm.DensityDist('doc', logp_lda_doc(beta, theta), observed=doc_t)
about failure to downcast from float64
to 32
(my global floatX
is set to 32
)
I've double checked my .theanorc
and even tried manually setting the flag to ignore
/off
but cannot seem to resolve it. any suggestions? just to clarify, this problem does not occur when using your old fork from my last comment with the edits but I am still struggling with setting the learning rate and had not been able to reproduce the topics of similar quality to yours.
Hi, thanks for your report. What version of theano and numpy are you using? I use theano 0.8.2 and numpy 1.10.4. Updating these libraries to latest versions might resolve the problem. Actually, I'm using a Docker container. If you want, I prepare a Docker file to build a container image on which the notebook works.
Hi @taku-y , I was using theano 0.8.2 and 1.10.4 before updating to latest pymc3 (when things were working). After upgrading, while debugging through the error i ended up on 0.9.0.dev2 and numpy 1.11.1 (though it wasn't helpful).
i found these two related issues mentioned on pymc3 repo
pymc-devs/pymc#1246
pymc-devs/pymc#1253
though not sure if they are resolved.
The interesting thing is that these problem only occur when i update to the latest pymc3 version. Running them on your old fork (from the time of my first comment with the suggested edits) doesn't cause any of those errors.
i haven't used docker before, but m willing to try it so it would be nice if you could send me the image. Thanks again.
I could not reproduce the test_compute_value
error with the latest master
branch of PyMC3. Could you show the error message? We might find something wrong.
Hi @taku-y, Here is the actual error. Somehow magically float64 are appearing even though my config file and flags are appropriately set.
Applied stickbreaking-transform to theta and added transformed theta_stickbreaking_ to model.
Applied stickbreaking-transform to beta and added transformed beta_stickbreaking_ to model.
TypeError Traceback (most recent call last)
in ()
10 beta = Dirichlet('beta', a=(1.0 / n_topics) * np.ones((n_topics, n_words)).astype('float32'),
11 shape=(n_topics, n_words), transform=t_stick_breaking(1e-9))
---> 12 doc = pm.DensityDist('doc', logp_lda_doc(beta, theta), observed=doc_t)
/IPC_MAP/lib/python2.7/site-packages/pymc3-3.0-py2.7.egg/pymc3/distributions/distribution.pyc in new(cls, name, _args, *_kwargs)
24 data = kwargs.pop('observed', None)
25 dist = cls.dist(_args, *_kwargs)
---> 26 return model.Var(name, dist, data)
27 elif name is None:
28 return object.new(cls) # for pickle
/IPC_MAP/lib/python2.7/site-packages/pymc3-3.0-py2.7.egg/pymc3/model.pyc in Var(self, name, dist, data)
304 self.named_vars[v.name] = v
305 else:
--> 306 var = ObservedRV(name=name, data=data, distribution=dist, model=self)
307 self.observed_RVs.append(var)
308 if var.missing_values:
/IPC_MAP/lib/python2.7/site-packages/pymc3-3.0-py2.7.egg/pymc3/model.pyc in init(self, type, owner, index, name, data, distribution, model)
579 self.missing_values = data.missing_values
580
--> 581 self.logp_elemwiset = distribution.logp(data)
582 self.model = model
583 self.distribution = distribution
in ll_docs_f(docs)
17 vfreqs = docs[dixs, vixs]
18 ll_docs = vfreqs * pm.math.logsumexp(
---> 19 tt.log(theta[dixs].astype('float32')) + 20 tt.log(beta.T[vixs].astype('float32')), axis=1).ravel().astype('float32')
21
/IPC_MAP/lib/python2.7/site-packages/theano/tensor/var.pyc in getitem(self, args)
502 TensorVariable, TensorConstant,
503 theano.tensor.sharedvar.TensorSharedVariable))):
--> 504 return self.take(args[axis], axis)
505 else:
506 return theano.tensor.subtensor.advanced_subtensor(self, *args)
/IPC_MAP/lib/python2.7/site-packages/theano/tensor/var.pyc in take(self, indices, axis, mode)
546
547 def take(self, indices, axis=None, mode='raise'):
--> 548 return theano.tensor.subtensor.take(self, indices, axis, mode)
549
550 # COPYING
/IPC_MAP/lib/python2.7/site-packages/theano/tensor/subtensor.pyc in take(a, indices, axis, mode)
2369 return advanced_subtensor1(a.flatten(), indices)
2370 elif axis == 0:
-> 2371 return advanced_subtensor1(a, indices)
2372 else:
2373 if axis < 0:
/IPC_MAP/lib/python2.7/site-packages/theano/gof/op.pyc in call(self, _inputs, *_kwargs)
610 for i, ins in enumerate(node.inputs):
611 try:
--> 612 storage_map[ins] = [self._get_test_value(ins)]
613 compute_map[ins] = [True]
614 except AttributeError:
/IPC_MAP/lib/python2.7/site-packages/theano/gof/op.pyc in _get_test_value(cls, v)
547 # ensure that the test value is correct
548 try:
--> 549 ret = v.type.filter(v.tag.test_value)
550 except Exception as e:
551 # Better error message.
/IPC_MAP/lib/python2.7/site-packages/theano/tensor/type.pyc in filter(self, data, strict, allow_downcast)
138 '"function".'
139 % (self, data.dtype, self.dtype))
--> 140 raise TypeError(err_msg, data)
141 elif (allow_downcast is None and
142 type(data) is float and
TypeError: For compute_test_value,
one input test value does not have the requested type.
The error when converting the test value to that variable type:
TensorType(float32, matrix) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function".
[[ 0.09999998 0.1 0.1 ..., 0.1 0.10000001 0.10000001] [ 0.09999998 0.1 0.1 ..., 0.1 0.10000001 0.10000001] [ 0.09999998 0.1 0.1 ..., 0.1 0.10000001 0.10000001] ..., [ 0.09999998 0.1 0.1 ..., 0.1 0.10000001 0.10000001] [ 0.09999998 0.1 0.1 ..., 0.1 0.10000001 0.10000001] [ 0.09999998 0.1 0.1 ..., 0.1 0.10000001 0.10000001]]
Hi @akashgit, thanks for sending me detailed information. But I could not reproduce your result yet. I tested the notebook on theano-0.8.2 and 0.9.0dev2, python2 and 3. Under all of these environments, ADVI worked.
Here is my Dockerfile on which the notebook ran.
From jupyter/datascience-notebook
MAINTAINER Taku Yoshioka <contact@google.com>
ENV TERM xterm-color
USER jovyan
RUN pip install theano joblib
Save the above as Dockerfile
, then you can build an image by typing
docker build -t pymc3-test .
pymc3-test
is the name of the created image. Finally, start running a container with the above image
docker run -d -v /your/home:/home/jovyan/work -p 8889:8888 --name pymc3 pymc3-test
You can access the notebook server via localhost:8889
. When using VirtualBox (DockerMachine), you need to specify the IP address on which DockerMachine is running. See https://docs.docker.com/machine/reference/env/. I recommend Linux or Mac, both of which have native support of Docker so you don't need to specify the IP address of the DockerMachine.
Here is the option of the command docker run
:
-d
: Background execution-v
: Mount volume/your/home
on your host to/home/jovyan/work
on the container.-p
: Port forwarding--name
: Image of the container
thanks for the file @taku-y. I will try to see what is wrong with my current configuration.
@taku-y Thanks for sharing. I have not tried your code but reading it I think there is bug here:
don't you need to enforce non-negativity for
zs_std
? I might be wrong or have not carefully read the rest of the code.