Thesis

4 78 CHAPTER 4 model performance and correctness. TMLE is a semi-parametric double-robust method that can withstand miss-specification of the causal model, improving the estimation of effect sizes using machine-learning methods. Double-robust implies that the estimation of the effect remains consistent if either the propensity score model1 or the outcome model is miss-specified [6]. Although TMLE is not new, its use in the field of sports science is absent. Often GLMs are used to study the physical performance of teams [7]–[9]. A disadvantage of GLM is that it is not robust on miss-specification and is an oversimplified representation of the real world [10]. However, its simplicity is also one of GLMs’ strengths. Assuming the model is well specified, it can give insight into the various essential coefficients for a measured outcome. Such statistical inference is generally impossible to achieve in complicated machine learning models [2]. Such machine learning models focus on prediction and learn this by minimizing a loss function, instead of focusing on statistical inference. TMLE aims to reconcile statistical inference and machine learning by introducing a two-step approach [2], [11], [12]. A machine learning algorithm is first trained on the dataset and then adapted to a particular question of interest in the so-called targeting step. With this step, non-parametric models, such as many machine learning models can be used while statistical inference is still possible [2], [13]. The aim of this paper is twofold. Firstly, we aim to provide a roadmap for making causal inference in sports science. Secondly, we aim to examine the applicability of the roadmap combined with a study of the performance TMLE in comparison with the traditional Generalized Linear Model (GLM) in identifying the effect size of a substitute in soccer. On the one hand we define a simulation study using simulation data on the influence of a substitute on the total soccer team distance as a measure for physical performance. To study the performance of TMLE in comparison with the traditional GLM, the identified substitution effect size of TMLE and GLM are compared using correct and miss-specified causal models. On the other hand, we apply observed match data to look at the effect size of a substitute on the total team performance in elite soccer using the roadmap combined with TMLE and GLM. Thus, we provide the basis for bringing causal inference and TMLE into the toolbox of sports science research and improving the quality of causal inference in sport science. The paper is structured as follows. In Section 2 we present the work that is related to the current study. In this we focus on scientific literature from the field of substitutes in 1 A propensity-score denotes the chance of a treatment given the confounders. If a certain stratum has a higher chance at receiving a treatment (e.g., being female increases the chances of receiving a treatment), a propensity-score can be used to control for this.

RkJQdWJsaXNoZXIy MjY0ODMw