4 4.5. Results 81 significant features. This technique is commonly used for dimensionality reduction and data analysis. We apply PCA using the Python PCA package (Taskesen, 2020). This process was applied to all case studies listed in Table 4.2, compiling a feature dataset for each case study with the 17 metrics in Table 4.1. Metrics were calculated for every FT in each generation to identify the most uncorrelated metrics. This is crucial as uncorrelated metrics enhance convergence in multi-objective evolutionary algorithms by improving diversity and preventing biased searches. The column titled “All FTs” in Table 4.2 specifies the total number of data samples available for PCA analysis in each case study. Data pre-processing. To mitigate the potential dominance of any case study due to data volume discrepancies, random sampling is employed across all cases, ensuring uniformity by aligning dataset sizes with that of the smallest one (i.e., case COVID19). Similarly, to avoid dominance of one feature over others due to magnitude disparities, we normalise each feature. This normalisation involves subtracting the mean and scaling to unit variance, a process executed using the StandardScaler function from the preprocessingmodule inscikit-learn(Pedregosa, Varoquaux, Gramfort, et al., 2011). Principal Component Analysis. We use PCA to identify the most informative metrics. We examine the explained variance percentage of each Principal Component (PC), depicted in the scree plot in Figure 4.2. This plot reveals that the first 7 out of 17 PCs account for 99.78% of the variance in the features dataset. This suggests that only 7 of the 17 PCs are informative. Percentage explained variance 1 2 3 4 5 6 1.00 0.75 0.50 0.25 0.00 7 8 9 Principal Component Cumulative explained variance 7 Principal Components explain [99.7%] of the variance 10 11 12 13 14 15 16 17 Figure 4.2: Scree plot: cumulative explained variance per principal component. The analysis of loadings reflects each feature’s contribution magnitude to a particular PC and is crucial for identifying the most informative metrics for FT inference. Table 4.3 presents the loadings for each metric across the seven main PCs. A higher absolute loading value indicates a stronger contribution to the respective PC, and the loading’s sign shows the correlation nature. This analysis reveals
RkJQdWJsaXNoZXIy MjY0ODMw