584779-Bijlsma

24 The reliability and construct validity of student perceptions of teaching quality 2.2.3 Reliability A measure is said to have high reliability when it produces similar results under comparable conditions (Baarda & De Goede, 2006; Fraenkel et al., 2012). To indicate the amount of error in the measures, various kinds of reliability coefficients, with values ranging between 0.00 (much error) and 1.00 (no error), can be calculated. In general, there are two ways to approach reliability, namely, global reliability and local reliability. Global reliability refers to the extent to which two parallel versions of the same test correlate with each other (Kenny et al., 1994). It can be conceptualized as the proportion of variance explained by the differences between two (or more) measurements, or the extent to which two randomly chosen respondents from a population can be distinguished from each other. The global reliability coeff icient can be estimated by, for example, calculating Cohen’s kappa (Cohen, 1960; Shrout, 1998). However, coeff icient kappa can underestimate reliability, because it does not consistently calculate the exact match between the measurements. Another way to determine global reliability is to calculate Cronbach’s coeff icient alpha (Cronbach, 1951; Santos, 1999). Although Cronbach’s alpha has received quite some critique (e.g., in situations where it does not take into account the nested structure of data; Dunn et al., 2013, Sijtsma, 2009), it is widely used as a measure of the reliability of (psychological) tests (COTAN, 2010). The concept of global reliability can also be def ined as the dependability of scores: the extent to which the variance of scores on a questionnaire depend on several variance components, such as respondents, time points and different tasks (Brennan, 2001). This provides an understanding of how much of the total observed variance in the measurements can be decomposed into these components and what happens with the reliability coeff icient if cases are added or removed within a component (Cronbach et al., 1972; Shavelson & Webb, 2005). Compared to other approaches to global reliability, this approach takes into account the multilevel structure of the data. As the dataset used in this study has a multilevel structure, the approach evaluating the dependability of scores was used. The local reliability investigated in this study is an indication of the measurement precision at a specif ic scale point. In other words, it is a measure of the precision with which a specif ic teaching quality score is estimated. This standard error is the inverse of what is called test information. Test information is the sum of item information values, which indicate how much the individual items contribute to the reliability of the instrument. This can be investigated by determining the information value of every single item on the questionnaire.

RkJQdWJsaXNoZXIy MjY0ODMw