603556-Tonnaer

6.2 Limitations 127 approach, e.g. in situations with occlusions, where a particular transformation may not always have a visible effect in the data space. Another limitation is that transformation labels are needed to compute DLSBD. Although it is common for disentanglement metrics to require full information on the underlying factors of the test set, as they are mostly intended for evaluation in research settings, this still limits the usability of the metric for more practical settings. Similarly, LSBD-VAE requires access to at least some transformation labels. Although we argue that transformation labels may be easier to obtain than exact factor values in certain cases (e.g. agent-environment settings), this still somewhat limits the applicability of the method. Furthermore, our results on the SHREC Object Retrieval Challenge show that it is difficult to learn good LSBD representations in more complicated realworld settings, especially in the presence of non-symmetric variation. This also highlights the limitation that LSBD is only helpful in cases where underlying mechanisms can indeed be accurately described with symmetry groups. Our work in Chapter 5 on OOD generalisation shows that LSBD and traditional disentanglement methods struggle to generalise towards unseen combinations of the factors they aim to disentangle. This already exposes some limitations of these models, which includes our own LSBD-VAE. However, there are some limitations to these evaluations as well. Our evaluations focus only on controlled settings with toy datasets and known underlying mechanisms. This allows us to properly evaluate the behaviour of the models, and the negative results in this relatively simple setting already suggest that models will struggle in more realistic settings as well. However, the approach of leaving out a full range of factor combinations is somewhat strict and unnatural, more realistic datasets may not have such large ranges of missing factor combinations. On the other hand, it is often harder to train models that disentangle well on more realistic datasets, which is necessary to be able to evaluate whether such disentangled models generalise better to unseen combinations. Thus, overall our evaluations are too limited to strongly conclude whether disentangled models can help for generalisation in more realistic settings. The conclusions in Chapter 5 are mostly drawn based on the likelihood estimation of the VAE-based models. But as discussed above for the anomaly detection case, likelihoods assigned by the model may not always be the most suitable measure to asses how “normal” a data point is. This is further emphasised by our conclusion that the LSBD-VAE encoder may still learn a decently equivariant mapping for OOD data, even if the likelihood of OOD data is lower

RkJQdWJsaXNoZXIy MjY0ODMw