Thesis

4 87 TRANSFERRING TARGETED MAXIMUM LIKELIHOOD ESTIMATION INTO SPORT SCIENCE Case study For the target causality to be identifiable, we need to write our target parameter as a function of the actual distribution P0. That is, identifiability would give us Ψ( P0 )≡Ψ( Pn ). In order to make this claim, we need to impose assumptions on the system. In our case study, we need two assumptions: (i) a positivity assumption and (ii) a no unmeasured confounders assumption (randomization assumption)8. The positivity assumption stated as P ( A = a | W) > 0 | ∀ a ∈ A indicates having enough observations with treatments and controls for all strata of W. For each combination of w ∈ W, we assume that the probability of treatment is greater than zero. If this assumption does not hold, it is not possible to infer the outcomes for the missing strata. The assumption will hold both in the case of simulation data and the observed data8. The second assumption is the no unmeasured confounders. This assumption states that there is no unmeasured confounding between treatment A and outcome Y, that is ⫫ ∥ . If we fail to make this assumption, it could be that there is an extraneous variable that influences both our treatment and our outcome variable, yielding the estimation of the causal effect of A on Y unreliable. In the simulation data there are no unmeasured confounders, as we control the causal model, the data, and the targeted quantity. This assumption is hard to validate for the observed data, as there are always unmeasured confounders in the real world. As can be seen in Figure 1, we know that there is the possibility that an underlying confounding effect exists, and we assume that in our case these effects do not exist / do not significantly impact the outcome of our model. If the dimension of W, measured confounders, is large enough, this assumption is likely to be valid. In this case study , for apparent reasons, this assumption is not satisfied. 3.5. Stating the statistical estimation problem In the fifth step, we state the statistical estimation problem and determine whether all the goals are met to answer our causal question. To perform this estimation, we rely on several assumptions, which are both knowledge-based, and convenience-based [22]. Knowledge based assumptions are based on actual knowledge that we have about the causal model and the data. Convenience-based assumptions are assumptions that provide identifiability, if true. Case study In our case study (and in many cases), the knowledge-based assumptions are not enough to reach identifiability and reason about causality, and as such, we introduced two convenience assumptions; a positivity assumption and an unmeasured confounding 8 This assumption will not hold when any w ∈ W is continuous. If that would be the case, we need to discretize W until the assumption holds.

RkJQdWJsaXNoZXIy MjY0ODMw