Thesis

3 59 MACHINE LEARNING SUPPORTING SUBSTITUTIONS IN SOCCER Table 1. Continued. Variable: Distance Covered Feature Explanation Percentage maximal power energy expenditure versus average percentage maximal power energy expenditure The percentage of the energy expenditure in the maximal power category versus the average percentage energy expenditure in the maximal power category in the specific five-minute period. Percentage summed maximal power energy expenditure versus average percentage summed maximal power energy expenditure The percentage of the summed energy expenditure in the maximal power category versus the average percentage summed energy expenditure in the maximal power category up to and including the specific five-minute period. To predict the underperformance of a player during the match, the underperformance was classified as not achieving 100%, 95%, or 90% of the entire season average of the individual player. The outcome measures were: distance (m) (for distance covered and distance in speed category) and energy expenditure (kJ kg-1) (for energy expenditure in power category). The machine learning process is visualized in Figure 1. The tracking data was used to calculate physical performance variables per individual player, as described before, and labelled as underperforming or not. After that, the data set was split into a 70% training set and a 30% test set. Subsequently, the training set was resampled to have an equal division of performing and underperforming labels using the SMOTE method [24]. Machine learning models were generated using the learning algorithms, and the test set was applied to identify the physical performance of the individual player. Since there is no linear relation in physical performance during the soccer match, treebased algorithms like the Random Forest algorithm and the Decision Tree algorithm were applied. Conducting the machine learning models was combined with parameter tuning, randomized search, and cross-validation [25]. A simple Naïve Bayes classifier was used as the baseline model to highlight the validity of the tree-based algorithms. As it is common practice for evaluating machine learning approaches, Random Forest and Decision Tree should outperform the simple Naïve Bayes baseline classifier. The following overall performance measures were calculated for each model: accuracy, precision, recall, F1-score, and Area under the curve (AUC). The scikit-learn package 0.23.1 in Python 3.7.2 was used to construct and judge the machine learning models’ performance. The source code, access to the data, and corresponding Jupiter notebooks of the machine learning procedure is available as open-source software on Github (https://github.com/dijkhuist/Early-Performance-Prediction-Machine-Learning-inSoccer). Variable: Energy Expenditure in Power Category

RkJQdWJsaXNoZXIy MjY0ODMw