type
Post
status
Published
date
Nov 5, 2025
slug
summary
tags
Embodied AI
Imitation Learning
category
Research
icon
password
VAE
a type of generative model used for unsupervised learning
learns a probabilistic mapping from data to a latent space and is composed of two main parts:
- Encoder: Maps input data to a latent variable space.
- Decoder: Reconstructs data from the latent variable.
CVAE
extends the VAE by adding conditional information to both the encoder and decoder.
Structure of a CVAE
- Encoder:
- The encoder network maps the input to a distribution in the latent space
- input: the data point x and the condition y (such as a label)
- output: a mean and variance for the distribution (typically Gaussian)
training stage:
evaluating stage:
- Latent Variable:
- The latent variable z is sampled from the distribution parameterized by the encoder. This latent variable captures the underlying structure of the data.
- To allow for backpropagation through the stochastic sampling process, VAEs use the reparameterization trick: , where μ and σ are the parameters learned by the encoder, and is a noise term.
- Decoder:
- The decoder network tries to generate data that is similar to the original input, conditioned on both the latent code and the label y.
- input: latent variable z and the condition y
- output: the reconstructed data x
- Loss Function:
- The objective of CVAE is to maximize the variational lower bound (ELBO), which can be decomposed into two main components:
- Reconstruction Loss: Measures how well the model reconstructs the input data, typically using binary cross-entropy or mean squared error for continuous data.
- KL Divergence: Measures the difference between the learned latent distribution and the prior distribution p(z), typically a standard Gaussian. This regularizes the model by encouraging the learned latent space to be similar to a known prior distribution.
positional embedding
- PositionEmbeddingSine
generates position encodings by applying sine and cosine functions with different frequencies to the normalized x and y coordinates
- PositionEmbeddingLearned
concatenate the x (column) and y (row) embeddings for each position
- 作者:Hatty
- 链接:https://notion-next-ochre-one-47.vercel.app//article/2a258186-cead-80ff-a11d-f6d0e5df315d
- 声明:本文采用 CC BY-NC-SA 4.0 许可协议,转载请注明出处。
