5B 128 CHAPTER 5B Table 1. ACWR-based features ACWR week-4 The ACWR, four weeks before either an injury or not ACWR week-3 The ACWR, three weeks before either an injury or not ACWR week-2 The ACWR, two weeks before either an injury or not ACWR week-1 The ACWR, one week before either an injury or not ACWR week-1-2 The average ACWR over two weeks before either an injury or not ACWR week-3-4 The average ACWR over two weeks, two weeks before either an injury or not We constructed two datasets, one with the ACWR-based features one, two, and three weeks before sustaining an injury or not and another with the complete set of ACWR features. First, the ACWR datasets were split into 80% training and 20% test sets. Next, the training set was resampled to have an equal division of injury and no injury using the SMOTE method [5]. Finally, machine learning models were generated using SMOTE training sets, and the original test sets were applied to identify the quality of the model. Since there is no linear relationship in the relative load, the tree-based algorithm Random Forest algorithm was applied. The machine learning models were constructed using parameter tuning, randomized search, and cross-validation [6]. A Naïve Bayes classifier was used as the baseline model to support the validity of the Random Forest algorithm. Because it is customary to evaluate the machine learning approach, Random Forest should outperform the simple Naïve Bayes baseline classifier. However, random Forest is known for its slightly unstable behaviour. Therefore the generation of the model was repeated 50 times. Subsequently, the model with the highest precision was used to test performance. Precision is the percentage of correctly identified injuries. The following overall performance measures were calculated for each model: accuracy, precision, recall, Area Under Curve (AUC), and F1-score. Finally, precision, recall, and the F1- score were calculated on correctly identified injuries and no injuries. The scikit-learn package 0.24.0 in Python 3.8.8 was used to construct and judge the machine learning models. The source code, access to the data, and corresponding Jupiter notebook of the machine learning procedure are available as open-source software on Github. (https://github.com/dijkhuist/ Running-Injuries-Machine-Learning, accessed on 16-04-2022). RESULTS Of the four models combined with the two data sets, the Random Forest model with all features performed best with an f1-score of 0.92. Also, the Random Forest model with all features performed best on the prediction of no injury, with an f1-score of 0.95. However, the precision was 0.03, and the recall was 0.11. Random Forest outperformed
RkJQdWJsaXNoZXIy MjY0ODMw