Revisiting Variational Autoencoders Part 2 (Machine Learning 2024) | By Monodeep Mukherjee

RAQ-VAE: Rate-Adaptive Vector Quantization Variational Autoencoder

Author: Seo Ji-wan, Kang Jun-hyuk

Abstract: Vector quantized variational autoencoder (VQ-VAE) is a well-established technique in machine learning for learning discrete representations across different modalities. However, the need to retrain the model to tune the codebook to different data or model scales limits its scalability and applicability. We present the Rate-Adaptive VQ-VAE (RAQ-VAE) framework, which addresses this challenge with two novel codebook representation methods: a model-based approach that uses a clustering-based technique on an existing well-trained VQ-VAE model, and a data-driven approach that uses a sequence-to-sequence (Seq2Seq) model for variable-rate codebook generation. Our experiments show that RAQ-VAE achieves effective reconstruction performance at multiple rates and often outperforms traditional fixed-rate VQ-VAE models. This work enhances the adaptability and performance of VQ-VAE for broad applications in data reconstruction, generation, and computer vision tasks.

2. Epanechnikov Variational Autoencoder

Authors: Tian Qin and Wei-Min Huang

Abstract: In this paper, we introduce a variational autoencoder (VAE) [17] Kernel Density Estimation (KDE) [25 ],[23] We approximate the posterior distribution with KDEs and derive an upper bound on the Kullback-Leibler (KL) divergence at the evidence lower bound (ELBO). The flexibility of KDEs allows for the optimization of the posterior distribution in VAEs, which not only addresses the limitations of the Gaussian latent space in vanilla VAEs, but also provides a new perspective on the estimation of the KL divergence at the ELBO. Under suitable conditions, [ 9],[3 ]we show that the Epanechnikov kernel is the optimal choice to asymptotically minimize the derived upper bound of the KL divergence. Compared to the Gaussian kernel, the Epanechnikov kernel has compact support and should result in less noisy and blurry generated samples. The implementation of the Epanechnikov kernel in the ELBO is straightforward as it is in the “location-scale” family of distributions where reparameterization tricks can be used directly. A series of experiments on benchmark datasets including MNIST, Fashion-MNIST, CIFAR-10, and CelebA further demonstrate that the Epanechnikov Variational Autoencoder (EVAE) outperforms vanilla VAE in the quality of reconstructed images measured by FID score and sharpness.[27]

Source link