Thesis

3 64 CHAPTER 3 3.2 Machine learning The three prediction models for the three different thresholds of 100%, 95%, and 90% of a player’s average physical match performance showed differences in accuracy and f1 scores for both tree-based and baseline models. These differences were primarily due to the reduced number of underperformers in the 90% category. While the split between over- and underperformers is 50% for the 100% thresholds, the number of underperformers decreases to 1% for the 90% thresholds (Table 2). This naturally favours the correct prediction of performers and impedes the minority category (underperformers). Random Forest and Decision Tree outperformed Naïve Bayes in precision and recall for all three variables (distance covered, distance covered in speed category, energy expenditure in power category) and thresholds (Table 3). Overall, the Random Forrest approach showed the best performance for all variables. Comparing the three different variables, energy expenditure in power category showed the best score on precision in every threshold and therefore provided the best prediction models. Overall, the precision of classifying underperforming players was increasing during the match. After 15 minutes applying either Random Forest or Decision Tree distance in speed category and energy expenditure in power category showed a precision of respectively 0.91, 0.88, and 0.92 for the thresholds 100%, 95%, and 90%. The baseline model Naïve Bayes was less precise than Decision Tree and Random Forest (Figure 7). Table 2. Variable distribution of the performing and underperforming players. Variables Threshold 100% Threshold 95% Threshold 90% Distance Covered Underperforming (n) 38490 60820 68347 Performing (n) 30590 8260 733 Distance In Speed Category Model Underperforming (n) 42014 64340 69520 Performing (n) 27866 5540 360 Energy Expenditure In Power Category Underperforming (n) 34416 7912 1604 Performing (n) 35463 61967 68275

RkJQdWJsaXNoZXIy MjY0ODMw