126 Chapter 7 Diagnostic performance to predict a (near-)complete response Average performance for all study readers to predict a (near-)complete response to CRT was similar for all three methods with an AUC of 0.71 (95% CI 0.60-0.82) for the 5-point confidence score, AUC 0.74 (95% CI 0.64-0.84) for the 4-point risk score, and AUC 0.72 (95% CI 0.62-0.83) for the dichotomized 2-point risk score; differences in AUC between the three methods were not statistically significant (p=0.10-0.64). Further accuracy figures are provided in Table 3. The 5-point confidence score resulted in slightly lower sensitivity than the other two methods (49% versus 57-59%); the other metrics were similar for the three different scoring methods. There was a tendency towards higher performance for the MRI-experts versus less expert readers, though these differences did not reach statistical significance (p=0.15-0.99; except for the PPV of the 5-point confidence score where the MR-experts scored significantly higher than the non-experts, p=0.03). The time interval between CRT and surgery/W&W had a significant confounding effect (with a tendency towards higher performance with longer intervals). Interobserver agreement and reader preference Table 4 shows the interobserver agreement for the three scoring methods, including specified results for the expert and non-expert readers; Table 5 shows the reader feedback (i.e. perceived difficulty per case and overall preferred scoring methods). Group IOA (Krippendorff’s alpha) for all readers combined was similar for the 5-point confidence level score (α=0.55) and the 4-point risk score (α=0.57), and lower for the 2-point score (α= 0.46). Agreement was higher for the MRI-experts compared to the less experienced readers, especially for the 5-point confidence score (α=0.64 versus 0.53) and for the 4-point risk score (α=0.65 versus 0.55). When looking at the individual variables included in the 4-point risk score, IOA for the assessment of EMVI and MRF involvement was higher than for the assessment of high risk T-stage and nodal involvement. Most readers found the simplified 4-point and 2-point risk scores easier to apply, compared to the 5-point confidence level score; most readers (55%) selected the 4-point risk score as their preferred method of response prediction. Table 4 Interobserver agreement (Krippendorf’s alpha) All readers (n=22) Expert readers (n=5) Non-expert readers (n=17) 5-point confidence score 0.55 0.64 0.53 4- point risk score (Total) 0.57 0.65 0.55 MRF invasion 0.47 0.60 0.45 High risk (bulky, T3cd-4) T-stage 0.39 0.39 0.39 Nodal involvement 0.37 0.43 0.34 EMVI 0.46 0.54 0.44 2-point risk score 0.46 0.44 0.47
RkJQdWJsaXNoZXIy MjY0ODMw