100 Chapter 6 scar sign was considerably lower with a median kappa of 0.17, which is also much lower than the κ0.69 reported in the initial paper by Santiago et al. This will likely again be related to difficulties in applying this method in a heterogeneous dataset, but perhaps also to the fact that out of all methods, readers may be least familiar with the split scar sign. Compared to previous publications, IOA for the other 3 methods was similar or also somewhat lower. For example, Siddiqi et al reported a median IOA of κ0.57 for 35 radiologists in applying the mrTRG in a small group of 12 patient cases [6], compared to a median κ0.41 in our current report with a considerably larger number of patient cases. Previously reported IOAs for the modified TRG and DWI pattern scores ranged between κ0.58 and 0.75 [10, 13]. Results for the more experienced readers in our current study were in the same range, with kappa’s varying between 0.42 and 0.77. Since the MRIs in our dataset date back as far as 2010, several scans did not meet current state-of-the-art recommendations for image acquisition. These “below-average” quality scans had a negative impact on our study results, and also offered us valuable insights into the importance of standardized scan quality. There are some other limitations to our study design. First, selection bias may have occurred as scans were semi-randomly selected from a larger dataset as detailed in the methods section. For the sake of feasibility, the number of cases was kept < 100, which is low compared to the number of study readers. Second, the four methods addressed in this study focus specifically on luminal response assessment. From a clinical perspective, MRI mainly has a supporting role (in addition to endoscopy) for luminal response assessment when selecting patients for and monitoring them during organ preservation [11, 26]. Though we acknowledge that one of the main strengths of MRI is the assessment of extraluminal disease (e.g. lymph nodes), assessing its value in this setting was outside the scope of our study, as was the assessment of MRI for follow-up during organ preservation. Third, the comparison of the four scoring methods may be somewhat biased in the sense that some (DWI patterns, split scar) are designed specifically for the differentiation between a complete response and residual tumor, while others are intended to grade the overall response and were dichotomized for the purpose of this study. Moreover, the number of response categories differs between the different methods. The degree to which readers were already accustomed to using the respective methods prior to the study will also likely have varied, though this is also reflective of variations between countries and centers in daily reporting practice. Fourth, the readers had access to all available images while performing their scorings. Though readers were instructed to only review the T2W images when evaluating the mrTRG and split scar, we cannot rule out that readers were biased by the findings of DWI. Finally, all MRI exams included in this study originate from
RkJQdWJsaXNoZXIy MjY0ODMw