603556-Tonnaer

58 Quantifying and Learning Linear Symmetry-Based Disentanglement (LSBD) {(xn,gn,xn+1)} N−1 n=1 where gn describes the transition from observationnto the next. Then, in a decoupled approach they first use a variant of a traditional disentanglement model (cc-VAE (Burgess et al., 2018)) to learn a disentangled representation, after which they use the transitions gn to learn the group action on Z. This action is not linear, so at best this approach can find SBD representations, but not LSBD. Additionally, Caselles-Dupré et al. (2019) propose an end-to-end approach, which they call Forward-VAE, that modifies a traditional VAE by adding a loss component Lforward = ∥A(gn) · zn −zn+1∥ 2, where Ais a learned transition matrix. The transitions gn come from a small set of possible transitions, in their case they represent a circle moving either up, down, left, or right. In other words, Forward-VAE is trained on pairs of consecutive observations, where the transition between these observations is known and can only attain a few different values. For each possible transition, a matrix Awith trainable parameters is learned that represents the action of this transition in the latent space. Following the idea that interaction with environments is required for learning (L)SBD representations, Quessard et al. (2020) consider a dataset with multiple trajectories (x1,g1,x2,g2, . . .), interpreting each transformationg as an element of a symmetry group Gof the environment. This group is unknowna priori, so they propose representing Gby a group of matrices belonging to SO(K), and parameterise any transformationg as the product of K(K−1)/2two-dimensional rotations g(θ1,2,θ1,3, . . . ,θK−1,K)=Q K−1 i=1 Q K j=i+1 Ri,j(θi,j),whereRi,j denotes the rotation in the i,j plane embedded in the K-dimensional representation. The parameters θi,j are jointly learned with the parameters of an encoder and decoder model. The encoder’s output is normalised so that it always maps observations to unit-norm latent vectors, thus the latent space is essentially a hypersphere with radius 1. This means that the action of Gon the latent space is transitive. The training procedure consists of encoding an observation xn and then transforming the resulting latent vector with the representation matrices of the next mtransformations in the dataset gn, . . . ,gn+m−1 . The results are then decoded again. The training objective is the minimisation of the reconstruction loss between the true observations xn+1, . . . ,xn+mand their reconstructions from the aforementioned procedure. Furthermore, they include a loss component Lent that aims to reduce the entanglement in an unsupervised manner, we briefly explain this quantity below. Note that this model is not a VAE, as there is no probabilistic part. Painter et al. (2020) confirm empirically that SBD representations are not generally found in traditional VAE-based disentanglement models, but they pro-

RkJQdWJsaXNoZXIy MjY0ODMw