Thesis

246 Chapter 9 Table 1 – Overview of clustering analysis steps Core analysis steps: 1. EU-NN database. In the data preparation phase, we explain how the EU-NN database was prepared for the clustering algorithm and how variable weightings were determined. 2. Clustering algorithm. In this step, we explain how the clustering algorithm works and how similarity between individuals is assessed through calculating distances between them. 3. Number of clusters. To determine at which number of clusters to stop the clustering algorithm, we combined standard clustering evaluation metrics (i.e., clustering quality scores) and visual inspection of grouping of individuals without cataplexy. 4. Clustering outcome. Once the number of clusters was determined, the cluster characteristics were visualized as barcodes per cluster per variable. 5. Differentiating variables. To identify the distinguishing variables, we visualized how distinct the clusters were per variable from the entire EU-NN database. We post hoc statistically compared the clusters containing mainly individuals without cataplexy on all variables. 6. Current diagnosis and centres of inclusion. After finishing the clustering algorithm, we identified the distributions of current diagnoses per cluster. We also checked the possible influence of centre of inclusion on cluster formation. Advanced analysis steps (Appendix A): 1. Clusterability of the EU-NN database. The intrinsic clusterability of the EU-NN database was assessed to test whether the EU-NN database entries show sufficient tendency (similarities) to be clustered by comparing the coefficient of determination of the clustering results in the EU-NN database with similarly shaped but randomly generated datasets. 2. Cluster distinctness. To test which clusters were most distinctly grouped, we calculated the silhouette coefficients per cluster. This metric of distinctness represents the ratio between the mean distance to individuals in the same cluster and the mean distance to individuals in the nearest other cluster. 3. Cluster reproducibility. To test whether the EU-NN database had sufficient entries to ensure that sample size variations are unlikely to change the clustering results, we repeated the clustering algorithm on random subsets of 80% of the EU-NN database and quantified similarity to the original clustering through mixing of individuals between clusters. Clustering algorithm In clustering, similarity is measured by calculating a distance between individuals; similar values on the input variables result in a smaller distance. Each individual is initially a cluster of their own, and the closest individuals (or clusters) are then sequentially combined into larger clusters until there is only one cluster left. Details on the distance calculations are reported in Appendix C.

RkJQdWJsaXNoZXIy MjY0ODMw