584779-Bijlsma

35 2 As can be seen in Figure 2.2b, the standard errors are larger towards the extremes of the Ѳ-scale (the teaching quality scale), so the average teaching quality (0) was measured with much more precision than lower or higher teaching quality. The standard errors are almost uniform in the middle of the scale. However, the information function (Figure 2.2a) is not single-peaked. This is typical for tests with polytomously-scored items (more than two answer options). Item information would indeed become a single-peaked function when the number of items approached infinity. 2.6 CONCLUSION, DISCUSSION, AND RECOMMENDATIONS FOR FUTURE RESEARCH 2.6.1 Conclusion In this study, the construct validity and reliability of the Impact! tool, a feedback system through which students rate the quality of their teachers’ instruction at the end of a lesson, were investigated. To assess the construct validity of the Impact! questionnaire and thereby answer our f irst research question, we utilized a thorough method, including a literature study, a review of student perception questionnaires, input from experts in the educational sciences, an extensive process of discussing and reformulating draft versions of the questionnaire items and a pilot of the items among teachers and students. The construct validity of the Impact! questionnaire was investigated by examining the model f it of a combined IRT and GT model, based on Impact! scores. In all three investigations of the model f it of the combined model, the average absolute difference scores of the items were below 0.1, meaning that all items contributed well to a single underlying unidimensional construct (Glas, 2016). The results regarding the second research question, centred on assessing the reliability of the measure, showed that students’ Impact! ratings may serve as reliable measurements of teaching quality, as the global reliability coeff icient was equal to 0.895. To obtain highly reliable scores with the Impact! tool (> 0.8), at least three measurements are needed (see Figure 2.1). Most of the variance in scores for teaching quality from students is explained by differences between teachers (35.6%), followed by differences between students’ opinions of teaching quality (24.4%). It is worth noting explicitly that these are potential confounds, and represent a limitation of the scale. The reliability of the questionnaire at the item level was also satisfactory, as the local reliability coeff icient of the total test was very high (0.925). This coeff icient was based on all discrimination ( Figure 2.1 has les effect on reliability. Local reliability Table 2.3 shows the discrimination ( -value) and locatio (k (which are a function of the -value and the - 0,5 0,6 0,7 0,8 0,9 1 2 3 4 5 6 7 Reliability Number of Measurements k a k a k d -value) and location ( -value) parameters of the items -value) at five distinct points on the teacher 5 6 7 N=19 N=10 N=5 k a k d k d -value) parameters of the items, provided in Table 2.3. Given the

RkJQdWJsaXNoZXIy MjY0ODMw