Variational Autoencoders (VAEs) are a class of generative models in machine learning designed to learn the underlying distribution of data and generate new data points similar to the training set. They are a type of autoencoder with a probabilistic approach to latent space representation, enabling the generation of diverse outputs from a given input distribution. VAEs are particularly useful in applications where controlled, interpretable, and continuous latent representations are desirable, such as image synthesis, anomaly detection, and data imputation.

VAEs are composed of two main components, an encoder and a decoder, each represented by neural networks that jointly learn to map data into a latent space and reconstruct data from this space. Unlike standard autoencoders, which map inputs to a fixed point in latent space, VAEs introduce randomness by mapping inputs to a probability distribution over the latent space, allowing the generation of new data points through sampling.

- Encoder: The encoder maps an input `x` to a distribution in the latent space, typically a Gaussian distribution parameterized by a mean `μ` and standard deviation `σ`. The encoder network outputs these parameters instead of a deterministic point, which allows VAEs to learn distributions rather than fixed encodings.

Given input `x`, the encoder produces the mean `μ(x)` and log-variance `log σ^2(x)` as follows:

`μ, log σ^2 = f(x; θ)`

Here, `f` represents the encoder neural network, and `θ` are the learned parameters of the encoder. - Latent Space and Sampling: To generate new samples, the VAE samples a latent vector `z` from the distribution learned by the encoder. The latent variable `z` is sampled from the Gaussian distribution parameterized by `μ(x)` and `σ(x)`. This sampling step introduces stochasticity, making VAEs generative by allowing them to produce diverse outputs from sampled latent codes. The reparameterization trick is used to ensure differentiability in training by transforming `z` as:

`z = μ(x) + σ(x) * ε`

where `ε` is sampled from a standard normal distribution `N(0, 1)`. This transformation makes `z` differentiable with respect to the encoder parameters, enabling gradient-based optimization. - Decoder: The decoder maps the sampled latent vector `z` back into the original data space, reconstructing the input `x` as closely as possible. The decoder learns to transform `z` into a probability distribution over the output, often a Gaussian for continuous data or Bernoulli for binary data. The output of the decoder, `g(z; φ)`, represents the reconstruction of `x`, where `φ` are the parameters of the decoder network.

The objective of VAEs is to maximize the evidence lower bound (ELBO), which approximates the likelihood of the data under the model. The ELBO consists of two components:

- Reconstruction Loss: This term measures how closely the reconstructed data matches the original data, encouraging the decoder to accurately capture data features. For continuous data, the reconstruction loss is typically the mean squared error or negative log-likelihood between `x` and `x_hat`, where `x_hat` is the reconstructed output of the decoder:

`Reconstruction Loss = E_q(z|x) [log p(x | z)]` - KL Divergence: The Kullback-Leibler (KL) divergence term ensures that the latent distribution `q(z | x)` remains close to the prior distribution `p(z)`, typically chosen as a standard normal distribution `N(0, 1)`. This regularization term penalizes deviations of the learned latent distribution from the prior, imposing a structured latent space that encourages smoothness and continuity. The KL divergence between `q(z | x)` and `p(z)` is:

`KL[q(z | x) || p(z)] = 1/2 * Σ (1 + log(σ^2) - μ^2 - σ^2)`

The ELBO objective to be maximized, combining both terms, is:

`ELBO = E_q(z|x) [log p(x | z)] - KL[q(z | x) || p(z)]`

This objective ensures that the VAE generates high-quality reconstructions while maintaining a regularized latent space.

A key feature of VAEs is the reparameterization trick, which allows backpropagation through the stochastic sampling process. Since `z` is sampled from a Gaussian distribution, the sampling step is non-differentiable. The reparameterization trick addresses this by expressing `z` as a deterministic transformation of `μ(x)` and `σ(x)` with an auxiliary variable `ε` sampled from a standard normal distribution. This formulation makes the sampling step differentiable, enabling gradient-based optimization of the VAE.

VAEs map input data into a continuous, structured latent space where similar data points are mapped to nearby points. This latent space representation is useful for generating novel samples, as interpolating between points in latent space results in smooth transitions in the generated outputs. Additionally, VAEs can disentangle different factors of variation in the data, making them valuable for tasks requiring interpretable and structured representations.

Unlike traditional autoencoders, which use a deterministic encoding-decoding process, VAEs introduce a probabilistic approach to encode data as distributions, not points. This approach allows VAEs to generate realistic variations of the input by sampling from the learned latent distributions, a property not present in deterministic autoencoders. As generative models, VAEs also produce smoother, more continuous latent spaces than standard autoencoders, making them better suited for synthesis tasks where novel outputs are desirable.

VAEs are widely applied in various fields, including image and text generation, data augmentation, anomaly detection, and semi-supervised learning, due to their ability to create diverse, structured, and interpretable representations. The probabilistic framework of VAEs has positioned them as a foundational model in generative deep learning, contributing to advances in both theoretical understanding and practical applications of deep generative models.

Generative AI

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

September 4, 2024

20 min

September 4, 2024

18 min

September 26, 2024

12 min

July 11, 2023

13 min