4.2 LSBD Definition 53 properties, and how to quantify this disentanglement withDLSBD. We show the utility of DLSBDby quantifying LSBD in a number of settings, for a variety of datasets with underlyingSO(2) symmetries and other non-symmetric properties (Section 4.6 & 4.7). First, we evaluate traditional VAE-based disentanglement methods and show that most don’t learn LSBD representations. Second, we evaluate LSBD-VAE and other recent methods that specifically target LSBD, showing that they can obtain much better DLSBD scores while needing only limited supervision on transformations. Third, we compare DLSBD with existing disentanglement metrics, showing that various desirable properties expressed with these metrics are also achieved by LSBD representations. Furthermore, we show a practical application of LSBD-VAE in theSHREC2021 3D Object Retrieval Challenge (Sipiran et al., 2021), where we augment the LSBDVAE with a triplet loss to accommodate the given retrieval task (Section 4.8). By rendering 2D images from the 3D objects under various orientations, we obtain a suitable setting for LSBD-VAE to disentangle the rotations (i.e. SO(2) symmetries) from object-specific information. However, our results show that achieving good disentanglement in this realistic setting is still challenging, which limits the performance on the retrieval task. 4.2 LSBD Definition Higgins et al. (2018) provide formal definitions of disentangled representations andlinear disentangled representations that connect symmetry transformations affecting the real world (from which data is observed) to the internal representations of a model. We refer to these definitions as Symmetry-Based Disentanglement (SBD) andLinear Symmetry-Based Disentanglement (LSBD). The definitions are grounded in concepts fromgroup theory, see Section 2.3 for a more detailed description of these concepts. In this section, we explain the underlying setting and assumptions for these definitions. Furthermore, we provide a slightly different version of the definitions that we show to be the same as the original definitions under mild conditions. Setting and notation The exact setting and notation we use for the definitions are as follows. Let W be the set of world states. We assume there is a generative process b : W→X that leads to observations in the data space X. A model’s internal representation of data is modelled with the encoding function h : X →Z that maps to the
RkJQdWJsaXNoZXIy MjY0ODMw