type
Post
status
Published
date
Nov 5, 2025
slug
summary
tags
Embodied AI
Imitation Learning
category
Research
icon
password

VAE

a type of generative model used for unsupervised learning
learns a probabilistic mapping from data to a latent space and is composed of two main parts:
  1. Encoder: Maps input data to a latent variable space.
  1. Decoder: Reconstructs data from the latent variable.

CVAE

extends the VAE by adding conditional information to both the encoder and decoder.

Structure of a CVAE

  1. Encoder:
      • The encoder network maps the input to a distribution in the latent space
        • input: the data point x and the condition y (such as a label)
        • output: a mean and variance for the distribution (typically Gaussian)
        • training stage:
          evaluating stage:
  1. Latent Variable:
      • The latent variable z is sampled from the distribution parameterized by the encoder. This latent variable captures the underlying structure of the data.
      • To allow for backpropagation through the stochastic sampling process, VAEs use the reparameterization trick: , where μ and σ are the parameters learned by the encoder, and is a noise term.
  1. Decoder:
      • The decoder network tries to generate data that is similar to the original input, conditioned on both the latent code and the label y.
        • input: latent variable z and the condition y
        • output: the reconstructed data x
  1. Loss Function:
      • The objective of CVAE is to maximize the variational lower bound (ELBO), which can be decomposed into two main components:
          1. Reconstruction Loss: Measures how well the model reconstructs the input data, typically using binary cross-entropy or mean squared error for continuous data.
          1. KL Divergence: Measures the difference between the learned latent distribution and the prior distribution p(z), typically a standard Gaussian. This regularizes the model by encouraging the learned latent space to be similar to a known prior distribution.
 

positional embedding

  1. PositionEmbeddingSine
generates position encodings by applying sine and cosine functions with different frequencies to the normalized x and y coordinates
  1. PositionEmbeddingLearned
concatenate the x (column) and y (row) embeddings for each position
 
北京实习小记北京实习小记
Loading...
Hatty
Hatty
忙忙碌碌寻宝藏
公告
🎉基地建设ing🎉
Embodied AI
Daily Life
. . .