Thesis

90 Chapter 6 (N.E.K.). The iScore platform incorporates the Open Health Imaging Foundation (OHIF) DICOM viewing platform [19]. An overview of the scoring setup in iScore including the full eCRFS is provided in Supplement 1 The study readers were asked to review the restaging MRIs (T2W, DWI, and ADC map) of the 90 study cases by comparing them to the primary staging MRIs and assessing the response to chemoradiotherapy using four different previously published response methods: mrTRG [6, 8], modified mrTRG [10, 11], DWI patterns score [13], and the split scar sign [14]. Details of these four scoring methods and how they were dichotomized are provided in Table 1. Readers were asked to indicate for each case whether they found the respective scoring methods easy, moderately easy/difficult, or difficult to apply; and to give an overall indication of which scoring method(s) they would prefer to apply in their own daily clinical practice. Readers were blinded to each other’s scorings and to the final response outcomes. Standard of reference The main study outcome was the differentiation between a complete response and residual tumor, using the pathologic tumor regression grade (pTRG) by Mandard [5] or clinical follow-up during organ preservation as the standard of reference. A complete response was defined as ypT0/pTRG1 after surgery, or a sustained clinical complete response during W&W for at least 2 years. Residual tumor was defined as ypT1-4/pTRG2-5 after surgery. Statistical analyses Statistical analyses were performed by one of the authors, a dedicated statistician (R.T.) using R statistics version 4.1.0 (2021) and IBM SPSS version 27 (2020). To assess the impact of reader experience (MRI expert versus abdominal/general radiologist) and MR image quality (good versus below average) on the average sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy of each method to predict a complete response (= positive outcome) versus residual tumor a mixed model linear regression was used. Results were additionally compared using receiver operator characteristics (ROC) curves. A patient-level random intercept was used to take into account the repeated measurements of each patient. A significance threshold of 0.05 was used throughout the analyses. Interobserver agreement (IOA) between individual readers was calculated using kappa analysis (κ) [20] with quadratic kappa weighting; group agreement was calculated using Krippendorff’s alpha [21, 22].

RkJQdWJsaXNoZXIy MjY0ODMw