VAEs and GANs are among the few generative models, that have attracted a lot of attention in the past few years. A VAE is good at reconstruction but suffers from blurry images that are the immediate result of pixel-wise MSE in its cost function that imposes an implicit guassian prior. A GAN, on the other hand, replaces this pixel-wise similarity with representations that are learned by its discriminator which is an adversary for the generator. Because of the complex features learned by the discriminator, the generations are quite sharp, but since there is practically no way to know what the discriminator is learning, the generated images may be far from that in the real world. Generator only tries to match the representation of its generation (as seen by the discriminator) with that of the real images.
Keeping in view, the rather complementary features of VAEs and GANs, there have been several attempts at combining these two models to get th