Thesis

4 88 CHAPTER 4 assumption (see Section 3.4). These assumptions are needed as we only have limited knowledge about the system we are dealing with. In general, such assumptions should be kept to a minimum (as few as possible, but enough to allow for statistical inference). In our case, the simulation dataset meets both the knowledge-based and the conveniencebased assumptions, for we control all aspects of the simulation dataset. In contrast, the tracking dataset meets all assumptions except for the unmeasured confounding assumption. 3.6. Estimation In the sixth step, the actual estimation done. So far, the roadmap has only helped define the problem we are solving and define the knowledge we have about the problem. With estimation, we aim to find a parameter ψn as an estimate of the true parameter ψ 0 of the true data generating distribution P0. To provide some intuition, the observed data we collected, O ∼ P0 is an empirical realization of data retrieved from the true data generating distribution, P0. Suppose P0 is controlled by an infinite-dimensional parameter ψ0 which controls the data P0 generates. Since we do not know P0, nor ψ0, we aim to find the parameter ψn, which is as close as possible to ψ0. We define a mapping function Ψ: ℳ → ψ , in which ℳ is the statistical model, defining all distributions P0 ∈ ℳ from this mapping follows that Ψ( P0) = ψ0 that is; the function Ψ yields the true parameter when provided the true distribution. Our goal is to find an estimator based on the empirical data, provided the true distribution. Our goal is to find an estimator based on the empirical data, ΨZ( Pn) = ψn in which provided the true distribution. Our goal is to find an estimator based on t ΨZ :ℳnon-parametric → . To illustrate the process of defining an estimator provided the true distribution. Our goal is to find an estimator base ΨZ( Pn) of Ψ( P0), our explanation will follow two stages. We will first start with a basic estimation procedure illustrated using a traditional generalized linear model (GLM) approach. Secondly, we show how an estimator of Ψ( P0) can be defined using Super Learning and TMLE. We can take this approach as we are dealing with a so-called substitution estimator, or plug-in estimator, allowing us to view the implementation of the estimator itself as an implementation detail [2]. 3.6.1. GLM based estimation The general estimation procedure relies on the definition of Q0 the relevant part of P 0 needed for the target parameter. That is, The general estimation procedure relies on the definition of Ψ( ;)≡Ψ( ;). In our definition of The general estimation procedure relies on the defi Ψ in Equation (1), Ψ( ;) only relies on \;( , ) ≡ ] \ [ ∣ , ] ^ and on Q0, w, the distribution of W 9. As such, Q0 is defined as the collection Q0 = ( \;, Q0, W). With these definitions, we now need to define algorithms that take in the empirical data, and for this we define the following steps: 9 We use the bar ( -) to differentiate between Q0 and the element \;, which is consistent with other Targeted Learning literature

RkJQdWJsaXNoZXIy MjY0ODMw