8 Introduction Therefore, better representations inZ should lead to a more reliable anomaly detection method, whose decisions are based on real-world properties. We motivated before that disentanglement is a good strategy to improve representations inZ, by disentangling underlying factors of variation from the real world W. In the context of anomaly detection, this is particularly helpful to prevent false negatives; where data is flagged as anomalous, but should be considered normal and thus well-represented by the model. Most traditional disentanglement methods assume that factors are statistically independent and that datasets contain examples from all possible combinations of factor values. In practice, however, there may be correlations between different factors, but they can still be disentangled (Träuble et al., 2021). For example, a dataset of human bodies would contain the underlying factors body height and foot size, which are clearly not independent. Nevertheless, we can identify these two factors as independent mechanisms, and unlikely combinations of factor values should still be modelled. E.g. small humans with big feet can still exist even if they aren’t observed in a particular dataset due to their lower likelihood. Furthermore, as the number of factors grows, the number of possible combinations of factor values grows exponentially, so it becomes unrealistic to expect a dataset to cover all possible combinations of factor values. Therefore, it is useful for models to be able to disentangle factors without needing to see all possible combinations and without assuming statistical independence between factors. A model that generalises well to unseen combinations is then less likely to flag such unseen cases as anomalous. Particular combinations of factor values may be out-of-distribution (OOD) from a probabilistic point of view, but should still be considered “normal” given that these factors represent underlying mechanisms of the world. Generalising to such unseen combinations of factor values is a type of out-ofdistribution generalisation (Shen et al., 2021), more specifically combinatorial generalisation. LSBD models are a sensible candidate to handle such generalisation, since they focus on modelling underlying mechanisms and should thus be capable of modelling unseen combinations that are the result of applying these mechanisms. Since LSBD defines disentanglement with respect to real-world symmetries, rather than statistical properties of the data and its underlying factors, it provides a suitable framework to generalise to unseen combinations even if they are empirically OOD. Q: How well do LSBD models generalise towards unseen observations that are the result of mechanisms observed during training? I.e. , how well do LSBD models performout-of-distribution (OOD) generalisation?
RkJQdWJsaXNoZXIy MjY0ODMw