3.5 Results 41 3.5 Results 3.5.1 MNIST Table 3.3 shows the auROC scores of our anomaly detection framework evaluated on MNIST, each experiment aiming to detect one anomalous digit against the remaining digits. For each of the experiments we evaluate two models with latent space dimensions 2 and 32, respectively. We see that a 2-dimensional latent space almost always outperforms 32 dimensions. In particular, we observe that the model with 32 dimensions sometimes hardly outperforms random guessing (which would correspond to an auROC score of 0.5). Moreover, in one case (anomaly digit 1) we even see the opposite of what we expect in our anomaly framework; i.e. an auROC close to 0, implying that most anomalous images get a lower anomaly score than normal images. The auPRC scores, shown in Table 3.4, lead us a similar conclusions (note that here scores are not expected to remain above 0.5, and are thus lower in general). Table 3.3: auROC scores for MNIST. Latent Anomalous digit dim. 0123456789 2d 0.96 0.73 0.97 0.94 0.73 0.92 0.95 0.77 0.92 0.66 32d 0.81 0.09 0.92 0.87 0.78 0.86 0.87 0.51 0.89 0.58 Table 3.4: auPRC scores for MNIST. Latent Anomalous digit dim. 0123456789 2d 0.75 0.21 0.76 0.64 0.23 0.45 0.69 0.27 0.47 0.15 32d 0.27 0.06 0.57 0.38 0.24 0.31 0.46 0.11 0.45 0.17 We conjecture that the reason for this is that the 32-dimensional model has too much capacity, and implicitly also learns to represent and reconstruct the anomalous digit even if it wasn’t trained on it. In a sense this corresponds to underfitting of the likelihood estimation, assigning too high density to digits not present in the training data.
RkJQdWJsaXNoZXIy MjY0ODMw