603556-Tonnaer

5.4 Experiments and Results 117 0.125 0.25 0.375 0.5 0.625 0.75 0.875 0 10000 20000 30000 ELBO Difference VAE BetaVAE DIP-VAE-I DIP-VAE-II FactorVAE cc-VAE LSBD-VAE (a) Square. RTE RTR EXTR 0 5000 10000 15000 20000 25000 ELBO Difference VAE BetaVAE DIP-VAE-I DIP-VAE-II FactorVAE cc-VAE LSBD-VAE (b) dSprites. 0.125 0.25 0.375 0.5 0.625 0.75 0.875 0 200 400 600 800 1000 1200 ELBO Difference VAE BetaVAE DIP-VAE-I DIP-VAE-II FactorVAE cc-VAE LSBD-VAE (c) Arrow. RTE RTR EXTR 0 2000 4000 6000 8000 10000 ELBO Difference VAE BetaVAE DIP-VAE-I DIP-VAE-II FactorVAE cc-VAE LSBD-VAE (d) 3D Shapes. Figure 5.5: Differences between train and OOD ELBO for all datasets and models. The horizontal axis shows different OOD splits. (a) DIP-VAE, training samples. (b) LSBD-VAE, training samples. (c) DIP-VAE, OOD samples. (d) LSBD-VAE, OOD samples. Figure 5.6: Examples of training and OOD samples (top lines) and their reconstructions (bottom lines) by two different models, for the Arrow 0.625 split. 5.4.2 OOD Detection: Area Under ROC Curve (AUROC) Another way to evaluate OOD generalisation is to inspect OOD detection, i.e. to investigate how well OOD data can be distinguished from training data using

RkJQdWJsaXNoZXIy MjY0ODMw