584779-Bijlsma

25 2 The information value of an item depends on the location parameter that indicates how diff icult it is to receive a high score on that item, and the discrimination parameter that indicates the contribution of that item to the scale and how well responses (in this study: responses of students) discriminate between items and students (Fauth et al., 2014; Kyriakides, 2005). In this study, both the global and the local reliability of the Impact! questionnaire were investigated by (respectively) evaluating the dependability of scores and calculating the measurement precision at a specif ic scale point. Further details are outlined in the Method section. 2.3 METHOD 2.3.1 Participants and research design In total, 26 teachers (58.3% of whom were male) with an average age of 41.1 years (SD = 10.8) and on average 12.7 years of teaching experience (SD = 8.9), and 717 students (48.5% male; all aged 14 or 15 years old) participated in the study. Throughout a period of 4 months during the 2016-2017 school year, the teachers and their students used the Impact! tool at the end of a number of mathematics lessons chosen by the teachers. In this way, student perceptions of the quality of their mathematics teachers’ teaching were collected. The number of measurement moments differed between teachers, ranging from 3 to 17, with an average of 7. 2.3.2 The data The items on the Impact! questionnaire were originally scored from 0 to 3. However, because the lowest category was used in too few responses (which negatively affected the stability of the analyses), the two lowest categories were combined in the analyses. The extra option not applicable, possible for three of the 16 questions, was coded as “missing values”. The data followed a multilevel structure pertaining to teachers, students, and time points (students’ responses were nested within teachers and collected at different times). With every teacher, the first and last measurement were administered using a paper-based format as part of a pretest/posttest (the pretest and posttest also included questions regarding background characteristics of students and teachers), while the intermediate measurements, at time points 2 up to 16, were conducted digitally using the Impact! tool. Items in both measurement formats were similar. To answer the research questions, data were analysed using a combination of an item response theory model and a generalizability theory model. In the following sections, these two models are described.

RkJQdWJsaXNoZXIy MjY0ODMw