4 83 TRANSFERRING TARGETED MAXIMUM LIKELIHOOD ESTIMATION INTO SPORT SCIENCE substitute in the current period. Our treatment variable, A ~ B is a binary intervention which indicates whether a substitution happened in the previous five-minute period. U W,A,Y ~ Pu are the unmeasured confounders that potentially influence the variables in the model2. Pu is the unknown distribution from which UW,A,Y is instantiated. Finally, we have the outcome of our model, Y ~ N (in which N denotes the normal distribution) a proxy for performance measured by the total distance covered by the team. A higher distance covered by the team indicates a higher performance. The relationships between these variables are defined as follows; period W1 influences the total distance of team Y, which is known to decline during the match [4]. As substitutions are highly dependent on the moment of the match, the period W1has a relationship with the substitutes present W2, current period substitutions W3, and substitutions of the previous period A. The total distance of the team Y depends on the number of substitutes present given A and W2, while substitutions cover more distance than all match players. When a substitute occurs within the current period W3, it leads to a dead ball moment and reduces the overall distance Y. Substitutes in the current period and previous period are also influenced by unknown confounders like an injury or tactical decisions. The overall distance Y of a team does not solely depend on the period and substitutes, and other possible unknown confounders U in our model are not accounted for but potentially influence the total distance Y [28]. After this first step, we have a clear definition of the knowledge and the relationships between the different variables under study, allowing us to move to the data we have about this system. 3.2. Specifying the simulation data, the observed data, and its link to the causal model In the second step we specify the observed and simulation data, and its link to the causal model. The causal model we defined in the first step presents what we know about the system, whereas the data describes what we have observed from it. The causal model describes various possible processes that yielded the data. This description of possible processes is strongly connected to the underlying statistical model of the data, that is, the set of all possible distributions from which the data originates. For this we define the data as O ⊂ ∼ P, where is the space of all possible generated data and P is the data generating distribution. Case study 3.2.1. Simulation data We implemented a data simulator to generate datasets according to the causal model in 2 Such as, playing home or away, rank of the teams, position system they play, current score, etc. These variables are by definition unknown and unmeasured. We do not know whether such variables exist and actually influence the model. However, they could be, which is why they are mentioned here.
RkJQdWJsaXNoZXIy MjY0ODMw