98 Chapter 6 Table 4 Interobserver agreement and reader preference mrTRG modTRG DWI pattern Split scar IOA (κ; median with ranges in parentheses) All readers (n=22) 0.41 (0.15-0.66) 0.42 (0.09-0.68) 0.48 (0.1-0.77) 0.17 (-0.07-0.6) Expert readers (n=5) 0.55 (0.45-0.66) 0.54 (0.42-0.64) 0.60 (0.54-0.77) 0.18 (0.02-0.33) Non-expert readers (n=17) 0.41 (0.15-0.63) 0.40 (0.09-0.68) 0.47 (0.1-0.71) 0.17 (-0.07-0.6) Difficulty to apply response method (%) Easy 42% 49% 55% 43% Moderate 45% 42% 36% 37% Difficult 13% 9% 9% 20% Preferred response method (%) 18% 68% 73% 5% Discussion This study aimed to validate and compare four previously published methods for rectal tumor response evaluation on MRI after chemoradiotherapy in terms of diagnostic performance to identify complete responders, inter-reader reproducibility, and reader preference. Overall, the most favorable results were found for response methods incorporating DWI, considering their good specificity of ± 80%, highest overall interobserver agreement, and the fact that the majority of readers preferred the DWI-based methods over the methods based solely on T2W-MRI. Diagnostic performance and interobserver agreement were lower for less expert readers and when MRI image quality was below current clinical standards. These findings emphasize the need for good-quality imaging using state-of-the-art MRI protocols, and the importance of dedicated radiologist training to evaluate restaging MRIs. The two preferred methods incorporating DWI (the modified mrTRG score and the DWI patterns score) showed a higher specificity compared to the two methods based solely on T2W-MRI (mrTRG and split scar). This implies a better performance for DWI-MRI to detect residual tumor within the fibrotic tumor bed, which is known to be one of the key strengths of DWI in the restaging setting and an important issue when aiming to safely select patients for W&W [16]. Specificity was particularly high (up to 90%) for the expert readers, with results comparable to the initial study publications [10, 13]. Sensitivity for both DWI-based scoring methods (± 40%) was however lower than in the initial reports. This indicates a risk that complete responders are wrongly classified as having residual tumor due to the presence of non-tumor (“false positive”) high signal on DWI, which is a
RkJQdWJsaXNoZXIy MjY0ODMw