584779-Bijlsma

28 The reliability and construct validity of student perceptions of teaching quality maximum score of 30 on the 15 items), a group with an intermediate total score level (from 11 to 15) and a group with a high total score level (above 15). For every item, the observed score and the posterior expected score in every scorelevel group were computed and the average absolute differences between those two scores were examined. For details on the general procedure of using difference scores, refer to Glas (1999). The second investigation of model fit was testing for local independence of the items. The assumption of local independence posits that the relation between student responses is entirely due to the latent variable and that responses are therefore uncorrelated after accounting for the latent variable. Yen (1984), van den Wollenberg (1982) and Glas (1999) pointed out that this assumption can be tested by evaluating the observed and expected association between two item responses. Average absolute differences between the observed and expected scores were calculated and examined. The third check on model fit is related to the fact that the questionnaire was administered in two different formats (paperbased and digital-based). To test the differences between the two formats, absolute difference scores for item-functioning values between the two methods of administration were examined. In all these three above-mentioned investigations of the model fit of the combined IRT and GT model, the average absolute difference should be no more than 0.1 (Glas, 2016). If so, the model measures a unidimensional construct, which would support the construct validity of the Impact! questionnaire (research question one). 2.4.2 Reliability To answer research question two, whether scores from students are reliable measurements of teaching quality, the reliability coefficient for a rating of teachers that is averaged over students and time points was computed by dividing the explained variance by the total observed variance resulting from the IRT model (Formula 1). In other words, the reliability coefficient is the ratio between the variance attributable to teachers and all remaining variance components as listed in Table 2.1, that is: the c nstruct validity of the Impact! question aire (research question one). 2.4.2 Reliability in Table 2.1, that is: , (3) the average number of students who rated a teacher. The variance over time points ( ) ( ) 2 2 2 2 2 | | / / j j i j it j R TR s r s s s = + + ! ! ! T! R ! 2 t s 2 jt s , (3) where administration were examined. In all these three above-mentioned investigations of the model fit of the combined IRT and GT model, the average absolute difference should be no more than 0.1 (Glas, 2016). If so, the model measures a unidimensional construct, which would support the construct validity of the Impact! questionnaire (research question one). 2.4.2 Reliability To answer research question two, whether scores from students are reliable measurements of teaching quality, the reliability coefficient for a rating of teachers that is averaged over students and time points was computed by dividing the explained variance by the total observed variance resulting from the IRT mod l (Formula 1). In other words, the reliability coefficient is the ratio betwee the variance attributable to teachers and all remaining variance components as listed in Table 2.1, that is: , (3) where stands for the weighted mean number of time points over students and stands for the average number of students who rated a teacher. The variance over time points and the v riance attrib table to the in eraction between teachers and time points do not appear in the denominator, because averaging the assessment over time points does not influence the reliability. All teachers are in principle assessed at the same fixed time points (no observations ( ) ( ) 2 2 2 2 2 | | / / j j i j it j R TR s r s s s = + + ! ! ! T! R ! 2 t s 2 jt s tands for the weighted mean number of time points over students and oints over students and stands for and the R ! 2 t s 2 jt s stands for the average number of students who rated a teacher. The

RkJQdWJsaXNoZXIy MjY0ODMw