603556-Tonnaer

108 Out-of-Distribution Generalisation with LSBD Representations does not seem to improve OOD generalisation much, even in relatively simple settings. Moreover, we show that LSBD representations also generalise poorly to unseen factor combinations, despite the more suitable perspective of modelling transformations. However, we also observe that this partially depends on how we measure OOD generalisation: for VAE-based models, unseen factors may be encoded fairly well (reflecting the underlying factor structure), even if they are decoded poorly. Our results suggest that more work is needed to learn representations that can generalise better to unseen factor combinations, even if equivariance with respect to disentangled transformations is used as a learning signal. We expose the limitations of LSBD representations to generalise to unseen factor combinations, even if the transformation mechanisms are captured well for the observed data. We hope that our results provide a basis for further research on how to design methods that generalise better to unseen factor combinations, and on how to evaluate such generalisation. 5.2 Related Work In this section we briefly summarise the relevant background for this chapter, as well as related work. More detailed background information about variational autoencoders (VAEs) and disentanglement can be found in Chapter 2, whereas (Linear) Symmetry-Based disentanglement (SBD or LSBD) is covered in detail in Chapter 4. Traditional Disentanglement Since the suggestion that disentangling underlying factors of variation in data is important for better generalisation (Bengio et al., 2012), various methods and metrics to learn and evaluate disentangled representations have been proposed (Burgess et al., 2018; Higgins et al., 2017; Kim and Mnih, 2018; Kumar et al., 2017). Most methods focus on expanding a Variational Autoencoder (VAE) (Kingma and Welling, 2013; Rezende et al., 2014) with some regularisation term that encourages disentanglement. They assume the data is independent and identically distributed. However, these methods showed only limited success, mostly on toy problems. Moreover, disentanglement in general is shown to be impossible without some sort of inductive bias (Locatello et al., 2018).

RkJQdWJsaXNoZXIy MjY0ODMw