Variational Autoencoders (VAEs) are a class of generative models in machine learning designed to learn the underlying distribution of data and generate new data points similar to the training set. They are a type of autoencoder with a probabilistic approach to latent space representation, enabling the generation of diverse outputs from a given input distribution. VAEs are particularly useful in applications where controlled, interpretable, and continuous latent representations are desirable, such as image synthesis, anomaly detection, and data imputation.
VAEs are composed of two main components, an encoder and a decoder, each represented by neural networks that jointly learn to map data into a latent space and reconstruct data from this space. Unlike standard autoencoders, which map inputs to a fixed point in latent space, VAEs introduce randomness by mapping inputs to a probability distribution over the latent space, allowing the generation of new data points through sampling.
The objective of VAEs is to maximize the evidence lower bound (ELBO), which approximates the likelihood of the data under the model. The ELBO consists of two components:
A key feature of VAEs is the reparameterization trick, which allows backpropagation through the stochastic sampling process. Since `z` is sampled from a Gaussian distribution, the sampling step is non-differentiable. The reparameterization trick addresses this by expressing `z` as a deterministic transformation of `μ(x)` and `σ(x)` with an auxiliary variable `ε` sampled from a standard normal distribution. This formulation makes the sampling step differentiable, enabling gradient-based optimization of the VAE.
VAEs map input data into a continuous, structured latent space where similar data points are mapped to nearby points. This latent space representation is useful for generating novel samples, as interpolating between points in latent space results in smooth transitions in the generated outputs. Additionally, VAEs can disentangle different factors of variation in the data, making them valuable for tasks requiring interpretable and structured representations.
Unlike traditional autoencoders, which use a deterministic encoding-decoding process, VAEs introduce a probabilistic approach to encode data as distributions, not points. This approach allows VAEs to generate realistic variations of the input by sampling from the learned latent distributions, a property not present in deterministic autoencoders. As generative models, VAEs also produce smoother, more continuous latent spaces than standard autoencoders, making them better suited for synthesis tasks where novel outputs are desirable.
VAEs are widely applied in various fields, including image and text generation, data augmentation, anomaly detection, and semi-supervised learning, due to their ability to create diverse, structured, and interpretable representations. The probabilistic framework of VAEs has positioned them as a foundational model in generative deep learning, contributing to advances in both theoretical understanding and practical applications of deep generative models.