603556-Tonnaer

5.3 Experimental Setup 113 LSBD-VAE uses a ∆VAE (Pérez Rey et al., 2019) to learn encoded (posterior) distributions on these latent subspaces. Like a regular VAE, the unsupervised∆VAE model is trained by minimising the negative Evidence Lower Bound (ELBO), but the prior and approximate posterior (encoder) are defined on a (typically non-Euclidean) Riemannian manifold Z. The prior is uniform over this manifold, whereas the approximate posterior is defined by a location and scaling parameter. To estimate the intractable terms of the negative ELBO, the reparameterisation trick is implemented via a random walk. Transformation-supervised batches The LSBD-VAE can be trained both on unsupervised images and transformationsupervised batches (x1,x2, . . . ,xM) where all samples can be expressed as a known transformation of the first, i.e. xm =gm· x1 for m= 2, . . . ,M. Each transformationg corresponds to changes in each of the factor values, and can be represented with rotation matrices acting on each of the latent subspaces. Given such transformation-supervised batches, LSBD-VAE includes an additional loss termLLSBD that encourages learning LSBD representations. LLSBD measures the dispersion of the points ρ(g−1 m ) · zmfor m=1, . . . ,M, where zm is the model’s encoding of data point xm. Ideally, since x1 =g−1 m · xm, all these points should be equal to achieve LSBD, so the dispersion provides a term to encourage LSBD. Formally, it is defined as LLSBD = 1 M MX m=1 ρ(g−1 m ) · zm−Π 1 M MX m=1 ρ(g−1 m ) · zm! 2 , where g1 =e is the group identity. In our experiments we train only on transformation-supervised batches without unsupervised training. This is a rather strong type of supervision, but our goal is to investigate how well-trained LSBD representations perform in OOD generalisation, not to find the most efficient way to train an LSBD-VAE model. We split up the training set into transformation-supervised batches of size M=32. Non-cyclic factors Although dSprites and 3D Shapes contain factors that don’t really have an underlying SO(2) structure, we can still use this formulation to train an LSBDVAE on these datasets, by mapping the factor values to suitable angle values from 0 to 2π radians. The shape factors in both datasets can be represented as equally

RkJQdWJsaXNoZXIy MjY0ODMw