Thesis

4 90 CHAPTER 4 Machine learning and cross-validation Machine learning focuses on training algorithm to perform an optimal prediction of an outcome Y given the input parameters , ( ∣ ). Training a machine learning model works by minimizing a so-called loss function over a series of cross-validation folds. Cross-validation aims to estimate how well a trained model performs on unseen data by sequentially leaving out data from the training procedure by minimizing a loss function. Cross-validation splits up the data Z = { Z1, …, Zn} into training and validation sets. Training and validation sets can be modelled using a random variable Bn ∈ {0,1} n. With V different cross-validation folds, Bn can take V different values, resulting in a b1, …, bv ∈ {0,1} n . Each b v then corresponds to either of two sets; a training dataset{ Zi : ≤ i ≤ n, bv( i) = 0} and a validation set { Zi : ≤ i ≤ n, bv( i) = 0}. In this case, bv( i) corresponds to the ith entry of vector bv. In our case, we only use one of the splits as a test set, ∑ RSQ3 S =1. Thus, each observation falls once in the validation set, and is used V − 1 times in the training set. Super Learning Cross-validation forms the basis of machine learning and is equally important for super learning. Super learning is a specific instance of machine learning that applies an ensemble methodology to automatically select the best machine learning algorithm, or a convex combination of machine learning algorithms. The Super Learner selects the best estimator among all candidate estimators based on these cross-validation scores [5]. The methodology generally consists of two implementations: the discrete super learner and the continuous super learner. For each cross-validation fold, the discrete super learner starts with a set = { 3,…, T} learners. These learners can be anything used to perform the prediction [ ∣ ], and could be as simple as a mean of the data, and as complex as a neural network or random forest. The Super Learner trains each + ∈ on each cross-validation fold, resulting in a set of estimators \ = c ̅+,U … ̅T,Sd and an accompanying cross-validation risk (loss) for each cross-validation fold. Based on these cross-validation risks, the discrete super learner selects the algorithm with the lowest risk by averaging across the folds. ̅ ∈ Y ] ̅ ^= ∑ Q \ , (5) The continuous super learner applies a similar procedure, only instead of selecting the single best estimator, it aims to find weights α = { α1, …, αm} where = c ∈ ℝ% :∑ Q = d (6) for each learner. The Super Learner is then defined as the dot product

RkJQdWJsaXNoZXIy MjY0ODMw