603556-Tonnaer

106 Out-of-Distribution Generalisation with LSBD Representations the combinatorial generalisation capabilities of our previously introduced LSBD model (LSBD-VAE) compared to traditional VAE-based disentanglement models. We observe that both types of models struggle with generalisation in more challenging settings, and that LSBD appears to show no obvious improvement over traditional disentanglement. However, we also observe that even if LSBD-VAE assigns low likelihood to OOD combinations, the encoder may still generalise well by learning a meaningful mapping that reflects the underlying group structure. 5.1 Introduction It is suggested that learning representations that disentangle underlying factors of variation in the data is an important goal towards better generalisation (Bengio et al., 2012). This is particularly obvious if we consider combinatorial generalisation, the ability to generalise to novel combinations of previously seen factors. Ideally, a model should be able to disentangle the underlying factors of a data point, even if that particular combination of factors was never observed during training, as long as each individual factor value has been seen before. Disentanglement models are typically trained on data that covers most factor combinations (Locatello et al., 2018), but the number of combinations scales exponentially with the number of factors, which quickly becomes unmanageable for realistic scenarios with more than a few factors. Thus, it is beneficial if models can learn the underlying mechanisms behind the factors without seeing all possible combinations. However, recent studies have shown that current disentanglement methods do not deliver on their promise in this so-called out-of-distribution (OOD) generalisation setting (Montero et al., 2021; Schott et al., 2022). Correlations between the factors of variation in the observed data are reflected in the learned latent representations of disentanglement models (Träuble et al., 2021), since the training methods are designed for independent and identically distributed (i.i.d.) data. For example, two factors that correlate strongly with each other may be represented in a single latent dimension, even if they represent two fundamentally different properties. There is clearly a misalignment between the concept of disentangling underlying factors of variation, which may not be independently distributed in the data, and learning to model the distribution of the data. Whereas most disentanglement methods aim to uncover the former, their methodology mostly focuses on the latter. One promising direction to resolve this misalignment is Symmetry-Based

RkJQdWJsaXNoZXIy MjY0ODMw