Thesis

6 137 GENERAL DISCUSSION introduces variability into the data, which reduces the ability to gain meaningful insights and the accuracy of predictions. Transforming the data to a lower granularity level helps filter out variability and produce meaningful insights and more accurate predictions. For instance, in Chapter 2, the too high-level granularity of the minute step data of the Fitbit prevented to predict whether individuals would reach their daily number of steps. Therefore, the minute step data had to be transformed into steps per hour. Likewise, in Chapter 3, the 10Hz individual position data contained a too high-level granularity and had to be transformed into 5-minute periods of performance measures to enable predicting individual soccer players’ performance throughout the match. By grouping the raw individual activity data out of the monitoring systems into appropriate time intervals, we can achieve a suitable level of data granularity for making personalised predictions. Nevertheless, one needs to experiment to optimise the level of granularity. When there are insufficient occurrences of the outcome of interest in the data, it can be challenging to train machine learning models effectively. This is because machine learning algorithms rely on patterns and relationships within the data; when the outcome of interest is rare or underrepresented (i.e. imbalanced dataset), the algorithm may not identify meaningful patterns [1]. For example, as we found in Chapter 2, when a participant did not wear his or her Fitbit regularly, it was hard to determine a pattern or a daily average of steps. As a result, we had to remove these participants from the dataset. Alternatively, specialised machine learning algorithms have been developed to handle imbalanced datasets, which take into account the imbalanced nature of the data and adjust their predictions accordingly. These machine learning algorithms can be combined with data balancing techniques such as oversampling or undersampling the dataset to improve predictions. [2]. For instance, in Chapter 3, we effectively used Random Forest, a machine learning technique less biased toward the majority class to address the imbalance in the dataset. Furthermore, to balance the dataset we used the SMOTE algorithm, which generated synthetic data points for the underrepresented class to balance the dataset, making the dataset more suitable for training machine learning models. The use of more sensitive performance measures influences positively the quality of predictions. Chapter 3 demonstrated that the precision of machine learning algorithms in predicting physical performance increases with the sensitivity of physical performance measures. Sensitivity is defined as the responsiveness of the physical performance measures to changes in physical activity [3]. For example, the most sensitive measure, ‘energy in power category’, leads to a 20% more accurate machine learning model than applying the least sensitive physical performance measure, ‘distance covered’. Consequently, our analysis indicates that the strength of the prediction models is related to the choice

RkJQdWJsaXNoZXIy MjY0ODMw