4.3 Related Work 57 3. The group representationρ : G→GL(Z) acts onZ as ρ(g) · z =(ρ1(g1) · z1, . . . ,ρK(gK) · zK), for g = (g1, . . . ,gK) ∈ Gand z = (z1, . . . ,zK) ∈ Z, with gk ∈ Gk and zk ∈Zk for k =1, . . . ,K. In other words, each subspace Zk is only affected by the subgroup representationρk, and remains fixed by all other subgroup representations ρj with j̸ =k. 4. The function h is equivariant with respect to the actions of Gon Xand Z, i.e. for all x∈Xand g ∈Git holds that h(g · x)=ρ(g) · h(x). Furthermore, we say that a group representation ρ is linearly disentangled with respect to the group decompositionG=G1 ×. . . ×GK if it satisfies criteria 1 to 3 from the LSBD definition above. 4.3 Related Work Other methods have previously focused on capturing transformations of the data, outside the context of disentanglement (Cohen and Welling, 2015; Sosnovik et al., 2020; Worrall et al., 2017). However, here we focus specifically on capturing transformations in the context of symmetry-based disentanglement. Although plenty of works have focused on quantifying and learning traditional disentanglement (as discussed in Section 2.2), most of this work does not follow the formal framework of SBD or LSBD (Higgins et al., 2018). Some recent works however do follow a symmetry-based group-theoretic view of disentanglement. As we’ve argued before, none of these approaches includes a general metric to quantify LSBD, but some do propose metrics that measure some aspect of LSBD. In this section, we outline some of these symmetry-based disentanglement works. In particular, we’ll consider various methods to learn (L)SBD representations, metrics to quantify the results of these methods, and an alternative formulation of symmetry-based disentanglement that differs slightly from LSBD. Learning SBD representations Caselles-Dupré et al. (2019) show that interaction with environments is required for learning (L)SBD representations, and propose both a decoupled and an end-to-end approach to do so. They argue that to learn (L)SBD representations, one should not use a training set of still samples {xn} N n=1, but rather transitions
RkJQdWJsaXNoZXIy MjY0ODMw