5.4 Experiments and Results 119 0.125 0.25 0.375 0.5 0.625 0.75 0.875 0.5 0.6 0.7 0.8 0.9 1.0 AUROC VAE BetaVAE DIP-VAE-I DIP-VAE-II FactorVAE cc-VAE LSBD-VAE (a) Square. RTE RTR EXTR 0.2 0.4 0.6 0.8 1.0 AUROC VAE BetaVAE DIP-VAE-I DIP-VAE-II FactorVAE cc-VAE LSBD-VAE (b) dSprites. 0.125 0.25 0.375 0.5 0.625 0.75 0.875 0.5 0.6 0.7 0.8 0.9 AUROC VAE BetaVAE DIP-VAE-I DIP-VAE-II FactorVAE cc-VAE LSBD-VAE (c) Arrow. RTE RTR EXTR 0.5 0.6 0.7 0.8 0.9 1.0 AUROC VAE BetaVAE DIP-VAE-I DIP-VAE-II FactorVAE cc-VAE LSBD-VAE (d) 3D Shapes. Figure 5.8: AUROC scores for detecting OOD from train data, for all datasets and models. The horizontal axis shows different OOD splits. 5.4.3 Reconstructions of OOD Combinations We can better understand the generalisation of LSBD-VAE on dSprites and 3D Shapes by inspecting reconstructions of OOD data from the different splits, as shown in Figure 5.9. For RTE we see that OOD data is reconstructed fairly well for both datasets, although dSprites images are sometimes reconstructed with the wrong shape. This seems mostly the effect of good interpolation, since for RTE only a limited number of combinations are left out during training. For RTR we see clear failure cases. The dSprites squares in unseenx-positions are reconstructed to the wrong shapes in the correct x-positions, so only one factor from the missing combination is inferred correctly. For 3D Shapes we see similar behaviour; oblong shapes in unseen colours are reconstructed into incorrect (mostly spherical) shapes but with the correct colour. For EXTR we see that the unseen factor values are not reconstructed well at all, dSprites images in unseen
RkJQdWJsaXNoZXIy MjY0ODMw