2.1 Variational Autoencoders 13 logpθ(x|z), where z =gϕ(ϵ, x) is computed from a single noise variable ϵ and the encoder output ϕ. The KL divergence can often be computed analytically, such that obtaining gradients is easy, but otherwise this reparameterisation trick can also help to obtain gradients for the KL divergence. The entire VAE is trained by maximising the ELBO with respect to ϕand θ simultaneously, using stochastic gradient descent (SGD) or related optimisers. Looking at the two terms that make up the ELBO, we can see that the first term Eqϕ(z|x)[logpθ(x|z)] acts as a negative expected reconstruction loss between an input data point x and its predicted reconstruction according to the encoder and decoder networks, showing that the VAE indeed acts as an autoencoder. The KL divergence in the second term then acts as a kind of regularisation for the latent space, ensuring that the learned approximate posteriors stay similar to the latent space prior. This, together with the sampling procedure for the reparameterisation trick, promotes a certain smoothness in the latent space. A common choice for the various distributions in the VAE framework is to use Gaussian distributions with diagonal covariance, which can also be interpreted as independent univariate Gaussians. In particular, the latent space prior is often a standard Gaussian, and the generative conditional distribution usually has a fixed variance. Put more formally, the distributions are then qϕ(z|x)=N(z|ϕ(x))=N(z|µenc(x), diag(σenc(x))), (2.4) pθ(x|z)=N(x|θ(z))=N(x|µdec(z),σdec · I), (2.5) p(z)=N(z|0,I), (2.6) where µenc andσenc are outputs of the encoder network with input x, µdec is the output of the decoder network with input z, σdec is a fixed scalar, andI is the identity matrix (size inferred from context). For such a Gaussian approximate posterior, a valid reparameterisation of z ∼ N(µ, diag(σ)) is z = µ+σ⊙ϵ for ϵ ∼ N(0,I), where ⊙ denotes the Hadamard (or element-wise) product. As explained before, we can then use single sample Monte Carlo estimation, such that the reconstruction term of the ELBO is essentially logpθ(x|z). Then, for our chosen Gaussianpθ(x|z) we can
RkJQdWJsaXNoZXIy MjY0ODMw