4.3 Related Work 59 pose a novel way of biasing models without relying on labelled action-transition pairs as the aforementioned methods do. They achieve this through unsupervised action estimation, i.e. they propose a model called Reinforced GroupVAE (RGrVAE) that, alongside a standard VAE, learns a model that infers a distribution over a set of possible (learnable) internal transformation matrices Ai given a pair of input images. These matrices Ai represent the transformation between the two input images. Their method works without using labelled information about the transformations between input image pairs, but this does assume that these transformations come from a small set of possible actions (e.g. moving one step up, down, left, or right in a two-dimensional grid). Quantifying SBD representations Caselles-Dupré et al. (2019) evaluate their methods by assessing the performance on the task of learning an inverse model, i.e. predicting the transitiongn from two consecutive observations (xn,xn+1). Results show that SBD and LSBD representations perform better at this task. Although they call this a downstream task, such transitions gn were in fact involved in the training protocol for both approaches (decoupled and Forward-VAE), so the task is still related to the training objective. Moreover, this kind of evaluation does not directly quantify LSBD according to its exact definition. Although the Lent loss component from Quessard et al. (2020) is presented as a metric, it applies particularly to their parameterisation of G = SO(K) as a product of two-dimensional rotations. Therefore, it cannot be used as a general metric for LSBD. The idea behind Lent is to minimise the number of rotations used in their parameterisation, to encourage each subgroup of Gto act on a specific subspace of the latent space only. Thus, for each g rotation parameters θ the loss component is defined as Lent(g)=P(i,j)̸=(α,β) |θi,j| 2 with θα,β =maxi,j(|θi,j|. This means that their evaluation mostly relies on qualitative inspection of the results. One quantitative evaluation they perform relies on inspection the reconstruction performance after predicting multiple steps (i.e. actions) in latent space and decoding the results, which can give indicative results but is not a direct quantification of LSBD and only works if data is given as trajectories of consecutive observations. Painter et al. (2020) mention two metrics to quantify LSBD. The first, Independence Score, measures whether the actions of the subgroups have effects on independent vector spaces. Similar to their RGrVAE method, they assume that all transformations between pairs of data points come from a small set of possible actions, specifically taking one step up, down, left, or right on a two-dimensional
RkJQdWJsaXNoZXIy MjY0ODMw