4.6 Experimental Setup 75 from 0 (corresponding to a ∆VAE) to N/2 (in which case each data point is involved in exactly one labelled pair). We set the weight γ of the supervised loss component to γ = 100 for all experiments. We choose M = 2 for our experiments since it is the most limited setting for LSBD-VAE. Higher values of Mwould provide stronger supervision, so successful results withM=2 imply that good results can also be achieved for higher values of M(but not necessarily vice versa). For the COIL-100 and ModelNet40 datasets, we train LSBD-VAE on batches containing images of one particular object from all different angles (72 and 64 for COIL-100 and ModelNet40, respectively). Each batch is labelled with transformations (g1,e), . . . , (gM,e), where gmrepresent rotations, and the unit transformatione indicates that the object is unchanged. To represent the rotations we use a S1 latent space as in∆VAE, whereas for the object identity we use a 5-dimensional Euclidean space with standard Gaussian prior as in a regular VAE. LSBD is measured as the disentanglement of rotations in the latent space. For these experiments we usedγ =1. 4.6.3 LSBD-VAE with Paths of Consecutive Observations It is often cheap to obtain transformation labels in settings where we can apply simple transformations and observe its effect, such as an agent navigating its environment. By registering actions (e.g. rotate left over a given angle) and the resulting observations, we can construct a path of consecutive views with known in-between transformations. We can then use these paths to train a LSBD-VAE. For the datasets withG=G1 ×G2 =SO(2)×SO(2) (i.e. Square, Arrow, and Airplane), we generate random paths by consecutively applying one randomly chosen transformation from{g1,g−1 1 ,g2,g−1 2 } where gk ∈ Gk for k ∈ {1, 2}, starting from randomly chosen observations. In our experiments, we generate 50 paths of length 100, and gk corresponds to an SO(2) transformation corresponding to an angle of 3 642π radians. Figure 4.5 shows some example paths for these three datasets. For the COIL-100 and ModelNet40 datasets there is only one group to disentangle. Therefore, similar random walks are not very meaningful here, and we do not evaluate them for these datasets. 4.6.4 Other Disentanglement Methods We furthermore test a number of known disentanglement methods for comparison, including traditional disentanglement methods as well as methods that
RkJQdWJsaXNoZXIy MjY0ODMw