603556-Tonnaer

5.4 Experiments and Results 121 factors that cannot clearly be mapped to LSBD symmetries, so we do not evaluate DLSBD on those datasets. We emphasise that LSBD-VAE is the only model that attempts to disentangle from an LSBD point of view, so the traditional models are included not for fair comparison but as indicative results. Indeed we observe that LSBD-VAE achieves better DLSBD scores overall, though some traditional models perform fairly well on the Square dataset. LSBD-VAE performance on DLSBD hardly suffers for the smaller OOD splits (up until the 0.5 split), and still performs fairly well for the larger OOD splits. This indicates that even though OOD generalisation seems poor when inspecting ELBO values, the encoder appears to represent the underlying structure of the data quite well. 0.125 0.25 0.375 0.5 0.625 0.75 0.875 0.0 0.2 0.4 0.6 0.8 1.0 DLSBD VAE BetaVAE DIP-VAE-I DIP-VAE-II FactorVAE cc-VAE LSBD-VAE (a) Square. 0.125 0.25 0.375 0.5 0.625 0.75 0.875 0.0 0.2 0.4 0.6 0.8 DLSBD VAE BetaVAE DIP-VAE-I DIP-VAE-II FactorVAE cc-VAE LSBD-VAE (b) Arrow. Figure 5.10: DLSBD scores (lower is better) for various OOD splits. To visualise this more clearly, we show 2D latent embeddings and traversals for LSBD-VAE on the Arrow datasets in Figure 5.11, for increasingly large OOD splits. From the top row we see that until the 0.5 split, the underlying structure is captured quite well even for unseen OOD combinations. For the 0.5 split we can clearly identify where the unseen OOD combinations are encoded since they break the axis-alignment, but the overall topology is still intact. For larger OOD splits we see that this topology starts to break, and that OOD encodings start overlapping with training encodings, thus the model starts failing to represent these OOD factor combinations. The bottom row of Figure 5.11 shows how the decoder fails for OOD combinations. For the 0.5 split, orientation and hue are easily recognisable and disentangled in the generated images. For larger splits, shapes become more disfigured, and eventually orientation and hue are no longer well-represented.

RkJQdWJsaXNoZXIy MjY0ODMw