603556-Tonnaer

5.4 Experiments and Results 115 2 factors could technically be disentangled in 2 latent dimensions, at least 4 dimensions are needed to represent the cyclic topology as well (Higgins et al., 2018; Pérez Rey et al., 2019). Nevertheless, we observed that the extra capacity of 7 latent dimensions led to better OOD generalisation results, thus we report only those experiments in Section 5.4. For dSprites and 3D Shapes (with 5 and 6 factors, respectively) we only trained with 7 latent dimensions. 5.4 Experiments and Results In this section we evaluate OOD generalisation by inspecting the likelihood that the models assign to training and OOD data, reconstructions of OOD data, and how well the encoders can learn equivariant representations for OOD data. 5.4.1 Likelihood Ratio: Training vs. OOD ELBO We follow a similar evaluation protocol to Montero et al. (2021), where we compute the mean negative log-likelihood (approximated with the negative Evidence Lower Bound or ELBO) for the training data as well as the OOD test data. A small difference between the training and OOD ELBOs indicates good generalisation, granted that the model has learned to represent the training set well. Note that since the ELBO is an approximation for the log-likelihood, the difference between ELBOs represents a likelihood ratio. Figure 5.4 shows the mean negative ELBOs for all models and datasets. For the traditional methods, we show the results for 7 latent dimensions, which showed the best OOD generalisation performance. Overall we observe that the training sets get similar ELBOs for a given dataset (with the exception of cc-VAE on the Arrow dataset, which failed to learn a good model), regardless of the OOD split. Yet the OOD set negative ELBOs increase for larger combinations of left-out factors (as expected). LSBD-VAE shows some advantage over the other methods for the Square dataset, but not for the other datasets. For dSprites and 3D Shapes, we confirm the findings from Montero et al. (2021) that OOD generalisation mostly happens in limited cases (RTE and to a lesser extent RTR) and seems largely independent of disentanglement, and that extrapolation (EXTR) doesn’t really happen. To investigate generalisation, the difference between the train and OOD ELBOs is more important than the absolute values. Figure 5.5 shows the differences for all models and datasets.Since the ELBO is an approximation for the log-likelihood, the difference between ELBOs represents a likelihood ratio. Again

RkJQdWJsaXNoZXIy MjY0ODMw