603556-Tonnaer

98 Quantifying and Learning Linear Symmetry-Based Disentanglement (LSBD) • TL: A single encoder with a triplet loss, corresponding to Equation (4.27) with λRE =λKL =λLSBD =0. All images from a single object are passed through the encoder and the obtained representations are averaged, see Figure 4.17. Only for this variant, class weights are used to weigh the contribution of each object to the total loss. This weight is given by O C·Oc , where Ois the total number of objects used for training, Cis the number of classes, andOc is the number of objects from a particular class c in the training dataset. • AE-TL: A regular autoencoder with triplet loss. In this case there is a decoder in addition to the encoder, which allows to compute a reconstruction loss for each images, see Figure 4.17. This corresponds to Equation (4.27) where λKL =λLSBD =0. During evaluation, an object’s shape descriptor is obtained by averaging all the low-dimensional representations for the object’s images from all orientations. • VAE-TL: A Variational Autoencoder (VAE) with triplet loss. Similar to the AE-TL variant, but with a VAE instead of a regular autoencoder. This corresponds to Equation 4.27 where only λLSBDequals zero. We use a standard Gaussian prior and the encoder represents a parametric Gaussian posterior with a diagonal covariance matrix. See Section 2.1 for more details about regular VAEs. • ∆VAE-TL: A∆VAE with triplet loss. Similar to the VAE-TL variant, but here the latent space is a hypersphere. For this we use a ∆VAE (Pérez Rey et al., 2019), see also Section 4.5.2. • LSBD-VAE-TL: An LSBD-VAE with triplet loss, see Figure 4.17. This corresponds to the full loss function in Equation 4.27 with all positive weight parameters λ. In this case, the latent space Zis decomposed into two parts, i.e. Z =Z1 ⊕Z2, where Z1 =S1 is a 1-sphere that should represent the orientation of the object, andZ2 =Rd is a d-dimensional Euclidean space that should represent the object’s shape descriptor. To compute the triplet loss, only the average of the Z2 representations of the images is used. The LSBD loss LLSBD is computed over the Z1 space, similar to our LSBD-VAE experiments on COIL-100 and ModelNet40 (see Section 4.6.2, but here we have M=12).

RkJQdWJsaXNoZXIy MjY0ODMw