Thesis

287 Clustering in Central Disorders of Hypersomnolence outliers to separate clusters. Complete linkage was therefore chosen, because it tended to group individuals with outlying values most inclusively. Clustering evaluation metrics Standard clustering evaluation metrics were used to determine how well the clustering algorithm performed with different numbers of clusters. Metrics included the coefficient of determination (R2), mean silhouette, inter- and intracluster distance and its ratio, and the Dunn’s index [344, 345]. The larger the values of Dunn’s index, mean silhouette, and inter-/intra-cluster-mean distance ratio, the better the model fits the dataset. As a rule of thumb, models with smaller numbers of clusters and R2 higher than 0.3 are generally preferred [346]. Mathematical definitions of distances, clustering evaluation metrics & mixing probability In the definitions below, we consider M individuals (or, more technically, their parameter vectors) {p1, p2, …, pM} distributed over N different clusters {C1, …, CN}. The outcome measures can be calculated for any step in the clustering process; therefore, N can be any number between M and 2 (inclusive). Let |Ci| denote the size of the cluster I, and let Cij be its j-th individual. For example, if cluster 2 contains individual with IDs 2, 42 and 100, then C2 = (p2, p42, p100) and |C2| = 3. Moreover, C2,1 = p2, C2,2 = p42 and C2,3 = p100. Distance between individuals: Gower’s distance In this work, we define distances between individuals via Gower’s metric. Gower’s metric is a weighted Manhattan metric that accounts for missing data. If somebody has a missing value on one of the dimensions, then that dimension will be ignored for all distance calculations involving that individual. The dimension will however be used when it is known for both individuals in a pair. Let there be L dimensions, with weights {w1, w2, …, wL}. Let Oij be a Boolean variable denoting whether the j-th dimension has been observed for the i-th individual (1: observed, 0: missing). The Gower’s distance dP between individuals pi and pj is then 9

RkJQdWJsaXNoZXIy MjY0ODMw