116 Out-of-Distribution Generalisation with LSBD Representations 0.125 0.25 0.375 0.5 0.625 0.75 0.875 0 10000 20000 30000 Neg. ELBO VAE BetaVAE DIP-VAE-I DIP-VAE-II FactorVAE cc-VAE LSBD-VAE (a) Square. RTE RTR EXTR 0 5000 10000 15000 20000 25000 Neg. ELBO VAE BetaVAE DIP-VAE-I DIP-VAE-II FactorVAE cc-VAE LSBD-VAE (b) dSprites. 0.125 0.25 0.375 0.5 0.625 0.75 0.875 4500 4750 5000 5250 5500 Neg. ELBO VAE BetaVAE DIP-VAE-I DIP-VAE-II FactorVAE cc-VAE LSBD-VAE (c) Arrow. RTE RTR EXTR 4000 6000 8000 10000 12000 14000 Neg. ELBO VAE BetaVAE DIP-VAE-I DIP-VAE-II FactorVAE cc-VAE LSBD-VAE (d) 3D Shapes. Figure 5.4: Mean negative ELBOs for all datasets and models. The horizontal axis shows different OOD splits. Dots and crosses show the mean negative ELBO for the training set and OOD test set, respectively. we observe that LSBD-VAE only shows improved generalisation for the Square dataset, whereas for other datasets it mostly gets outperformed by the traditional methods. This is particularly surprising for the Arrow dataset, where the symmetrybased paradigm is most suitable. We suspect that the increased model capacity of the traditional models with 7 latent dimensions helps more for generalisation than the strong regularisation of the LSBD-VAE. To illustrate this, we compare the reconstructions of OOD samples from the Arrow 0.625 split by DIP-VAE and LSBD-VAE, see Figure 5.6. Both models reconstruct training data similarly well, but their behaviour on OOD data is quite different. DIP-VAE reconstructs OOD samples to darkened images with an uneven colour that nevertheless capture the underlying factors (orientation and hue) fairly well, whereas LSBD-VAE reconstructs into clearer images with an incorrect factor combination (one that has been seen during training). Therefore, DIP-VAE achieves a lower pixel-wise reconstruction error, which is a main component of the ELBO computation.
RkJQdWJsaXNoZXIy MjY0ODMw