Action Chunk Transformation(ACT) | Hatty~de秘密基地

type

Post

status

Published

date

Nov 5, 2025

slug

summary

VAE

a type of generative model used for unsupervised learning

learns a probabilistic mapping from data to a latent space and is composed of two main parts:

extends the VAE by adding conditional information to both the encoder and decoder.

training stage:

evaluating stage:

The latent variable z is sampled from the distribution parameterized by the encoder. This latent variable captures the underlying structure of the data.

To allow for backpropagation through the stochastic sampling process, VAEs use the reparameterization trick: , where μ and σ are the parameters learned by the encoder, and is a noise term.

The decoder network tries to generate data that is similar to the original input, conditioned on both the latent code and the label y.

The objective of CVAE is to maximize the variational lower bound (ELBO), which can be decomposed into two main components:

Reconstruction Loss: Measures how well the model reconstructs the input data, typically using binary cross-entropy or mean squared error for continuous data.

KL Divergence: Measures the difference between the learned latent distribution and the prior distribution p(z), typically a standard Gaussian. This regularizes the model by encouraging the learned latent space to be similar to a known prior distribution.

generates position encodings by applying sine and cosine functions with different frequencies to the normalized x and y coordinates

concatenate the x (column) and y (row) embeddings for each position