603556-Tonnaer

4.8 SHREC 2021 Object Retrieval Challenge 99 Figure 4.17: Diagrams with the architectures used in the Triplet Loss (TL), Autoencoder with Triplet Loss (AE-TL) and LSBD-VAE with Triplet Loss (LSBD-VAE-TL) submissions. Architectures and hyperparameters Each encoder consists of a backbone ResNet50v2 (He et al., 2016) pre-trained on ImageNet with non-trainable layers, with a final layer created with average pooling and an extra trainable dense layer with 1000 units. Each decoder consists of a simple network with three dense layers of 256, 512, 512 units with ReLU activations and batch normalisation after each layer. The final layer corresponds to a dense layer with256×256×3 neurons and a sigmoid activation whose output is reshaped to the reconstructed image size. Further hyperparameters for each variant are the number of training epochs, the dimensiond of the low-dimensional representation (shape descriptor), and the weighing factors for the loss. The hyperparameters were tuned by splitting the training 3D models datasets into train (70%) and validation (30%) sets, their performance was evaluated with the relevant metrics. The hyperparameters that gave the best results were used, see Table 4.10. 4.8.3 Results and Conclusions We now briefly summarise the results of the challenge, for our submitted variants as well as methods submitted by others. For further detail of all the other methods and for more results we refer to Sipiran et al. (2021). All methods are evaluated using five metrics: • NN: Nearest Neighbour. Given a query, NN measures the precision of the

RkJQdWJsaXNoZXIy MjY0ODMw