584063-Bourgonje

123 Figure 4 | Classification between patients with CD and healthy controls, patients with UC and healthy controls and patients with CD and UC based on antibody epitope repertoires. Antibody epitope repertoires show superior discrimination between patients with CD and HC (A, B) in comparison to patients with UC vs. HC (C, D) or between patients with CD and UC (E, F). (A,C,E) ROC curves demonstrating the discriminative capacity of antibody epitope repertoires in classifying between patients with CD and HC in the test set (20% of the data). (B,D,F) Confusion matrices showing the predicted class numbers (from left to right and top to bottom: true negatives [TN], false positives [FP], false negatives [FN] and true positives [TP]) and proportions in the test set while adopting a probability threshold of 0.5. Abbreviations: AUC, area under the curve; CD, Crohn’s disease; HC, (age- and sex-matched) healthy controls; UC, ulcerative colitis. Patients with CD can be accurately identified based on only ten antibody-bound peptides As a next step, we aimed to identify the top contributing antibody-bound peptides with regard to the three classification tasks (CD vs. healthy controls, UC vs. healthy controls and CD vs. UC) and evaluate the extent to which only a selection of antibody-bound peptides could discriminate between groups (i.e. without requiring a large number of antibodies). To do so, we adopted a feature selection method using a recursive feature elimination (RFE) procedure that fits a model and optimizes it by removing the weakest features (here: individual antibody-bound peptides) until the pre-specified number of antibody-bound peptides is reached (Tables S11–12). When selecting the top five antibody-bound peptides, an accurate discrimination between patients with CD and HCs could already be achieved (AUC = 0.81, F1-score = 0.74) (Figures 5A–B). When selecting the top ten antibody-bound peptides, this discrimination improved considerably further (AUC = 0.87, F1-score = 0.77) and was statistically significant (DeLong’s test, Z-statistic: -3.25; FDR = 0.01) (Table S13). Notably, this classification model achieved similar discriminative performance to the model that included all contributing antibody-bound peptides (Figure 4A), both of which indeed showed no statistically significant discriminative accuracies (DeLong’s test; Z-statistic 2.36; FDR = 1.00) (Table S13). The top-ranked antibody-bound peptides were the P30 adhesin protein of Mycoplasma pneumoniae (less frequent in CD), the human collagen type IV alpha chain protein, tegument protein of HSV-2, a translocator protein of Pseudomonadaceae, and flagellins from Clostridiales, Legionellaceae, Eubacterium, Roseburia and Borrelia (all more frequent in CD). When discriminating between patients with UC and healthy controls, a selection of the top-five and top-ten antibody-bound peptides already resulted in a reasonably accurate discrimination that approached that observed in the test set using the full set of antibody-bound peptides (top five: AUC = 0.75, F1-score = 0.66; top ten: AUC = 0.76, F1-score = 0.67) (Figure 5C–D). The top contributing antibody-bound peptides to this classification were the P30 adhesin protein of Mycoplasma pneumoniae and tegument protein of HSV-2 (both overlapping with previous model, both less frequent in UC and CD); peptides belonging to surface proteins or zinc metalloprotease of Streptococcus pneumoniae and the EspB protein of E. coli O127:H6 (all less frequent in UC), WxL domain-containing protein of Enterococcus faecalis, DUF4988 domaincontaining protein of Bacteroides and two allergen peptides belonging to Danio rerio (zebrafish) and Bos (wild or domestic cattle) (all more frequent in UC). Finally, we extracted the top five and top ten antibody-bound peptides contributing to the classification of patients with CD from patients The antibody epitope repertoire in IBD

RkJQdWJsaXNoZXIy MjY0ODMw