4 86 CHAPTER 4 3.3. Specifying the target quantity The third step in the roadmap is the definition of the target, the causal quantity, or, more specifically, the definition of the causal question of interest. The target quantity can be seen as the main question we would like to answer about the underlying system. Examples of target quantities are: ‘What is the average treatment effect of a medicine versus placebo?’ or ‘How much does gender influence the outcome of a drug?’. This approach is significantly different from general machine learning approaches, as these generally focus on optimizing a prediction for a multitude number of questions at hand. In contrast, the targeted learning approach only picks one specific question, drastically reducing the complexity of the problem [21] . To define this target quantity, we need to identify the target population with which we are working, the intervention we are doing on this target population, and the outcome we are interested in. Case study In our case study, we are interested in determining the effect of substitution (the intervention; A) on the total distance in meters (the outcome; Y) of the team (the target population). We can further specify our question using the notion of counterfactuals; an alternative scenario that has not occurred but that helps us to answer our question. In our case study, we want to see the effect of a substitution A = 1 versus not doing a substitution A = 0. In some cases, the actual observation we did might not have had a substitution at that time; thus, it represents a ‘counterfactual world.’ Using these counterfactuals, we can adequately define what we are interested in in our case: we are interested in the difference in team distance between a substitution vs. no substitution simultaneously in time. 3.4. Assessing identifiability In the fourth step, we determine identifiability. It should be determined whether sufficient knowledge and data are available to answer the causal question or whether additional assumptions need to be made. The defined causal question can be modelled as an average intervention effect, or Average Treatment Effect (ATE), in which a substitution is seen as the intervention / treatment. In social studies, ATE is referred to as Effect Size [30], [31]. Formally, an ATE can generally be formulated using the G-computation formula [32], = ( )= [ ( ∣ = , ) − ( ∣ = , )]. (1) This G-computation formula determines the average effect of a treatment by determining the average difference between the outcomes for the treated and the non-treated. Note that we use the notation P0 here to denote the true probability distribution from which O originates7. 7 Note that we’re not discussing the unmeasured confounders and the distribution thereof for the sake of clarity. Please see the Targeted Learning of van der Laan and Rose book [2] for more details.
RkJQdWJsaXNoZXIy MjY0ODMw