Thesis

4 93 TRANSFERRING TARGETED MAXIMUM LIKELIHOOD ESTIMATION INTO SPORT SCIENCE For simulation, we used the data simulation system conforming to the causal model. Because we know the exact configuration of this simulator, we can correctly, or purposely incorrectly, specify the data that our learning algorithms take into account. As such, we performed a series of experiments using GLM as defined in Section 3.6.1 and TMLE using super learning as defined in Section 3.6.2 applying standard learners and handpicked learners (TMLEH): glm, glm.interaction, step, step.interaction, glm.interaction, gam, randomForest, rpart. We used the continuous Super Learner in all experiments. We first calculated the actual expected ATE on the total distance of the soccer team (Y) given a substitution in the previous period ( A) and used that as the ground truth of our simulator. After that, we estimated the ATE of a substitution in the previous period ( A) on the total distance of the soccer team ( Y) using the three algorithms mentioned above. First, we used a correctly specified model as input to show the optimal performance of each of the algorithms. After that, we used a miss-specified model leaving the substitution of the current period ( W3) out of the model to indicate how each of the algorithms could cope with this. The code of simulation is written in R 4.0.2 and available online11 . Next to the simulation study, we studied how TMLE can be applied to the observed dataset. For the application on the observed dataset, we calculated the ATE of a substitute in the previous period (using GLM as defined in Section 3.6.1 , TMLE and TMLEH using (continuous) super learning as defined in 3.6.2). First, we used a correctly specified model as input to answer the question on the influence of substitution in the previous period ( A). After that, we used a miss-specified model leaving the substitute in the current period ( W3) out of the model to indicate how the algorithms would handle the absence of a confounder. The code of case study is written in R 4.0.2 and available online12 3.7. Interpretation The last step of the roadmap is the estimation interpretation. Depending on the strength of the assumptions made in 3.5. The stronger the assumptions, the stronger the relationship between the phenom observed and the interpretation. To interpret the results of the data analysis, we can hierarchically depend on the strength of the assumptions on the use of statistical, counterfactual, feasible intervention, or randomized trial [22]. ’The use of a statistical model known to contain the true distribution of the observed data and of an estimator that minimizes bias and provides a valid measure of statistical uncertainty helps to ensure that analyses maintain a valid statistical interpretation. Under additional assumptions, this interpretation can be augmented.’[22]. Case study In our case study, we made both knowledge-based and convenience-based assumptions on the simulation dataset and the observed dataset containing the true distribution and 11 Available at https://github.com/dijkhuist/Entropy-TMLE-Substitutions 12 Available at https://github.com/dijkhuist/Entropy-TMLE-Substitutions

RkJQdWJsaXNoZXIy MjY0ODMw