Thesis

290 Chapter 9 Coefficient of determination Interpreting the centre of a cluster as the prototypical individual for that cluster, the coefficient of determination R2 (between 0 and 100%) quantifies the degree to which the set of N typical individuals summarizes the dataset as a whole. The measure is high if most individuals are similar to the typical individual of the cluster they are assigned to, and dissimilar from the centre of the whole dataset. Specifically, R2 measures the goodness of fit of the clustering, when viewed as a model with the N typical patient vectors as its inferred parameters. The distance between individual pi = Cjk and the centre < Cj > of corresponding cluster Cj is a fitting residual, and R2 is conventionally defined as one minus the ratio between the sum of squares of these residuals, and the sum of squares of the residuals of the simplest model. The simplest model is the 1-cluster model (i.e., when the dataset is not partitioned). Note that, as the number of clusters increases, R2 is bound to increase. However, an increased model complexity (and thus an increased goodness of fit) does not imply increased reproducibility or predictive power. Although the choice of linkage type affects how clusters form, it does not affect this definition. Jack-knife resampling – mixing probability If the clustering is robust against addition or removal of data, individuals that were in different clusters in the original run should remain in different clusters in jack-knife runs. If members of the i-th original cluster are indeed seldom grouped with members of the j-th original cluster during jack-knife runs, the mixing probability Mij is low; if such groupings occur often, Mij is high. After running a clustering run yielding clusters {C1, …, CN}, we consider K jackknife runs in which we incorporate only a fraction f of all individuals, yielding a total of fM included individuals per jack-knife run. Let Jkij be a Boolean variable denoting whether the j-th individual of the i-th cluster from the original run

RkJQdWJsaXNoZXIy MjY0ODMw