603556-Tonnaer

120 Out-of-Distribution Generalisation with LSBD Representations x-positions are reconstructed in different positions, often with a different shape and in one case the reconstruction even shows two objects. Similarly, for 3D Shapes the unseen floor colours are not reconstructed well at all, but they are substituted with floor colours that have been seen during training. (a) Recombination-to-element (RTE). (b) Recombination-to-range (RTR). (c) Extrapolation (EXTR). Figure 5.9: LSBD-VAE reconstructions of OOD data from various splits of dSprites (left) and 3D Shapes (right). 5.4.4 Equivariance of OOD Combinations So far, the evaluations we showed rely mostly on reconstruction performance, which is the main component of the ELBO in a VAE-based model. Such evaluations are heavily focused on the generalisation of the decoder. Yet, in a representation learning setting we are typically mostly interested in the behaviour of the encoder, which is the model that actually learns representations. In Symmetry-Based Disentanglement (SBD), we can use the notion of equivariance to evaluate the generalisation of the encoder. In particular, for Linear SBD (LSBD) we can use our DLSBD metric (see Section 4.4) to quantify the equivariance with respect to the transformations in the full dataset including the left-out OOD combinations. This gives us a measure of how well a model can represent (in a linear manner) the underlying structure of the data, even if it hasn’t observed certain parts of this structure. Figure 5.10 shows the DLSBD scores for all models on the Square and Arrow datasets, lower scores are better (0 is optimal). dSprites and 3D Shapes contain

RkJQdWJsaXNoZXIy MjY0ODMw