54 Quantifying and Learning Linear Symmetry-Based Disentanglement (LSBD) embedding space Z. The original definitions of SBD and LSBD furthermore consider the compositionf : W→Z,f =h◦b. Moreover, we assume there is a group Gof symmetries acting on Wvia a group action· : G×W→W. We would like to find a group action1 of GonZ, i.e. · : G×Z →Z, such that the symmetry structure of Wis reflected inZ. This is achieved if f is anequivariant function with respect to the actions of GonW andZ, i.e. if g · f(w)=f(g · w) ∀g ∈G,w∈W. (4.1) Intuitively, this says that it doesn’t matter in which order we apply f and the action of G(on either Wor Z), the result should be the same. In particular, we assume that the symmetries of the world decompose as a direct product G=G1 ×. . . ×GK, for some natural number K. For disentanglement, we want the action of GonZ to be such that the subgroups of Gaffect only specific subspaces of Z. Changing the perspective from world states to the data space The original definitions of SBD and LSBD are given in terms of equivariance of the function f with respect to the actions of Gon the world states Wand the embedding space Z. However, we believe it’s more practical to define disentanglement with respect to the encoding functionhand the data space X, since hcorresponds directly to a model, andXis where data is actually observed. Under one mild condition, we can rewrite the action of GonWas an action on X, and define SBD and LSBD in terms of equivariance of the encoding function hwith respect to the actions of GonXandZ. This new definition is exactly the same as the original definition as long as one assumption holds. Specifically, to rewrite the action of GonWas an action onX, we require that the observation functionb : W→Xbe injective. This is typically the case, but not always. If b is not injective, then two different world states can lead to the same observation, e.g. because of occlusion. In such case there is no unique world state w∈Wassociated with any data observationx∈X. However, this can be solved in practice by making b injective through active sensing (Soatto, 2011). Thus, under the assumption that the observation functionb is injective, we 1Note that the notation· is used for the action of GonbothWandZ, the correct group action should be inferred from context.
RkJQdWJsaXNoZXIy MjY0ODMw