70 Quantifying and Learning Linear Symmetry-Based Disentanglement (LSBD) in the embedding space: Π(z)=z/|z|. The scalar curvature of S1 is 0. Alternatively, other methods to learn encodings on a specific latent manifold can be used, e.g. Hyperspherical VAE (Davidson et al., 2018) for a hyperspherical latent manifold (such as S1 in our example). Hyperspherical VAE uses a von Mises-Fisher (vMF) distribution for the approximate posterior. The prior is uniform on the hypersphere, which is a special case of the vMF distribution. While this method should work just as well as the basis for LSBD-VAE in the case of hyperspherical sub-manifolds, we choose to use ∆VAE as it should generalise more easily to different types of (Riemannian) sub-manifolds. 4.5.3 Semi-Supervised Learning with Transformation Labels Caselles-Dupré et al. (2019) showed that LSBD representations cannot be inferred from a training set of unlabelled observations, but that access to the transformations between data points is needed. They therefore use a training set of observation pairs with a given transformation between them. However, we posit that only a limited amount of supervision is sufficient. Since obtaining supervision on transformations is typically more expensive than obtaining unsupervised observations, it is desirable to limit the amount of supervision needed. Therefore, we augment the unsupervised ∆VAE with a supervised method that makes use of transformation-labelled batches, i.e. batches {xm} M m=1 such that xm=gm·x1 for m=2, . . . ,M, where the transformations gm(and thus their group representations ρ(gm)) are known and are referred to as transformation labels. The simplified version of the metric from Equation (4.7) can then be used for each batch as an additional loss term (withx0 =x1), as it is differentiable under the assumptions described above (using the Euclidean norm). We make a small adjustment to Equation (4.7) for the purpose of our method, since the mean computed there does not typically lie on the latent manifold ZG. Thus, we use the projection Πfrom the ∆VAE to project the mean onto ZG. Writing the encodings as zm := h(xm), the additional loss term for a transformation-labelled batch{xm} M m=1 becomes LLSBD = 1 M MX m=1 ρ(g−1 m ) · zm−Π 1 M MX m=1 ρ(g−1 m ) · zm! 2 , (4.25) where g1 =e, the group identity. Moreover, instead of feeding the encodings zmto the decoder, we use ρ(gm) · z, where z = Π 1 MP M m=1 ρ(g−1 m ) · zm . This provides a kind of batch-wise
RkJQdWJsaXNoZXIy MjY0ODMw