Thesis

247 Clustering in Central Disorders of Hypersomnolence Number of clusters The number of clusters was determined combining two techniques. We first calculated multiple standard clustering evaluation metrics (i.e., clustering quality scores) that describe how well the clustering algorithm performs with different numbers of clusters (Appendix C and E). These metrics are normally high when individuals are similar to others in the cluster and distinct from the individuals in other clusters. The main aim of this study was to see whether data-driven algorithms would segregate narcolepsy type 1 and identify more reliable subgrouping of individuals without cataplexy because current diagnostic criteria struggle most with this subpopulation. We therefore also focused on subgrouping of individuals without cataplexy by visual inspection of the clustering steps of the full dataset from 15 to two clusters to better understand how people without cataplexy were subdivided and when these clusters were merged. The final model is usually a compromise of the evaluation metrics and the clinical aim of the study. Clustering outcome Clustering results were visualized as barcodes representing the mean normalized values per cluster on all variables (also called means barcodes). Variables were ordered according to the aforementioned categories, and clustering mean values were left blank when <10 values were present within a cluster. Differentiating variables Two methods were used to quantify differentiating variables between clusters. First, we used a resampling technique to test how different the clustering means were from the entire EU-NN database. We then also formally compared clusters dominated by individuals without cataplexy on all variables. A resampling technique was implemented to test whether the clustering results deviated from the entire EU-NN database. The resampling technique enabled us to deduce the extent to which the means barcodes were different from what would be expected if the same number of clusters with the same sizes were randomly drawn from the entire EU-NN database. We generated 10,000 such draws and calculated the mean and standard deviation (SD) of each draw per cluster per variable. For each variable, we divided the difference between the resampled mean and corresponding original clustering mean by the SD of the resampled means. This was done per variable per cluster and visualized as the significances barcodes. Only values with >25 observations 9

RkJQdWJsaXNoZXIy MjY0ODMw