Thesis

265 Clustering in Central Disorders of Hypersomnolence Appendix A – Advanced analyses Methods Clusterability of the EU-NN database Intrinsic dataset clusterability can be measured by comparing the coefficient of determination of the actual dataset with randomly generated, uniformly distributed datasets. If a dataset is composed of uniformly distributed subjects, clustering can still be performed, but will result in a low coefficient of determination as clusters are not distinct. To test the intrinsic clusterability of the EU-NN database, we repeated the clustering with 20 randomly generated datasets and compared the coefficients of determination. These random datasets had the same layout as the EU-NN database with 1078 inclusions and the same number of variables and outcome options and missing values on the variables. We replaced all original values with random numbers from uniform distributions over the outcome options, respecting the categorical or continuous nature of variables. Cluster distinctness Silhouette coefficients were calculated for individuals and displayed per cluster to help identify which clusters were most distinctly grouped. The silhouette coefficient represents the ratio of the mean distance between single individuals with others in the same cluster and the individuals in the nearest neighbouring cluster. If a particular cluster has large, consistently positive silhouette values, this means the cluster is distinctly grouped. Cluster reproducibility Jack-knife resampling was implemented as a cross-validation method to test whether the EU-NN database had sufficient entries to ensure that sample size variations are unlikely to change the clustering results. We repeated the clustering algorithm 100 times with random 80% selections of individuals in the EU-NN database patients. We calculated the chance of grouping two individuals together in resampling iterations that were in different clusters in the original clustering. This highlights the clusters between which there is more frequent mixing. A higher mixing probability between two clusters, however, does not necessarily mean that these clusters are not robust, as it can also mean that these clusters are frequently merged into one larger cluster in the resampling iterations for the chosen number of clusters, and are still separate with a larger number of clusters. For more information, see Appendix B. 9

RkJQdWJsaXNoZXIy MjY0ODMw