4 77 TRANSFERRING TARGETED MAXIMUM LIKELIHOOD ESTIMATION INTO SPORT SCIENCE 1. INTRODUCTION Empirical scientific research is intrinsically linked to statistical analysis and modelling. Statistical models are used to better understand phenomena and their underlying causal processes that are at play. Researchers rely on empirical data collected from these underlying causal systems that underpin these processes. In the best case, this data is collected in a controlled environment using a Randomized Controlled Trial design (RCTs); a design that has been around for several centuries [1]. However, in many cases the world is messy, and especially in sports science an RCT during a match is often not possible and researchers rely on data obtained from observational studies. Controlling (all) variables is hard, if not impossible, or unethical. While the lack of RCTs seems to make causal inference difficult, methods exist that allow causal reasoning on observational datasets. Furthermore, alternative technologies exist that generally work better than the current status quo [2]. An elite soccer match is inherently only measurable by observing a complex set of latent causal relations, which complicates the determination of the isolated effects of an event on the outcome. Causal modelling of the influences in a match is intrinsically incomplete, and therefore applying a statistical method that is most robust to incorrectly specified models provides the best understanding of phenomena. A phenomenon of interest in soccer is the influence of substitutes. Substitutes are acknowledged to be important in soccer. In general, substitutions can be initiated by an injury of a player, necessary tactical changes (e.g., because of being behind in a match), or under-performance of a player [3]. Besides necessary substitutions (e.g., because of an injury), substitution may be the most powerful tool for coaches to influence a match. Substitutions can minimize or offset the effects of fatigue and give new stimuli to the match as elite substitutes introduced during the second half can cover more distance and perform more physically intensive actions relative to whole match players over the same period[4]. However, the observation that a substitute can cover a greater distance is a fraction of reality [4]. Despite an extensive body of research on substitutes, to the best of our knowledge, there is no single study that studies the causal inference of the influence of a substitute on the total physical performance of a soccer team. That is: does the total team physical performance increase due to the use of substitutes? One particular field of causal inference that has received traction over the past years is the Targeted Learning approach [5]. The Targeted Learning methodology aims to reconcile traditional statistical inference with modern, state-of-the-art machine learning models. In this paper, we focus our interest on Targeted Maximum Likelihood Estimation (TMLE), a method that enables causal reasoning and modelling and that can improve
RkJQdWJsaXNoZXIy MjY0ODMw