668430-Roa

Lisandro Arturo Jimenez Roa Reliability and Maintenance for Engineering Systems Fault Trees, Degradation Modelling and Maintenance Optimisation

Reliability and Maintenance for Engineering Systems: Fault Trees, Degradation Modelling and Maintenance Optimisation Lisandro Arturo Jimenez Roa

Reliability and Maintenance for Engineering Systems: Fault Trees, Degradation Modelling and Maintenance Optimisation DISSERTATION to obtain the degree of doctor at the University of Twente, on the authority of the rector magnificus, prof. dr. ir. A. Veldkamp, on account of the decision of the Doctorate Board to be publicly defended on Friday 7 February 2025 at 14.45 hours by Lisandro Arturo Jimenez Roa born on the 22nd of September, 1991 in Cali, Colombia

This dissertation has been approved by: Promotors prof. dr. M.I.A. Stoelinga prof. dr. ir. T. Tinga Co-promotor prof. dr. T.M. Heskes Cover design: Lisandro Arturo Jimenez Roa Printed by: Ipskamp Printing Lay-out: Typeset in LATEX, based on the Overleaf template by Steve Gunn, Sunil Patel, and Joshua Botha; modifications by Lisandro Arturo Jimenez Roa ISBN (print): 978-90-365-6406-9 ISBN (digital): 978-90-365-6407-6 URL: https://doi.org/10.3990/1.9789036564076 ©2024 Lisandro Arturo Jimenez Roa, The Netherlands. All rights reserved. No parts of this thesis may be reproduced, stored in a retrieval system or transmitted in any form or by any means without permission of the author. Alle rechten voorbehouden. Niets uit deze uitgave mag worden vermenigvuldigd, in enige vorm of op enige wijze, zonder voorafgaande schriftelijke toestemming van de auteur.

Graduation Committee: Chair / Secretary: prof. dr. ir. B.R.H.M. Haverkort Promotors: prof. dr. M.I.A. Stoelinga University of Twente, EEMCS, Formal Methods and Tools prof. dr. ir. T. Tinga University of Twente, ET, Dynamics based Maintenance Co-promotor: prof. dr. T.M. Heskes Radboud Universiteit Nijmegen Committee Members: prof. Dr. Ing. B. Rosic University of Twente, ET, Applied Mechanics & Data Analysis prof. dr. ir. M. van Keulen University of Twente, EEMCS, Datamanagement & Biometrics prof. dr. ir. G.J. van Houtum Eindhoven University of Technology prof. dr. ir. E. Zio Mines Paris – PSL University, Politecnico di Milano prof. dr. ir. P. Dersin Luleå University of Technology

“I do not think there is any thrill that can go through the human heart like that felt by the inventor [who] sees a creation of the brain unfolding to success . . . ” Nikola Tesla

UNIVERSITY OF TWENTE Abstract Faculty of Electrical Engineering, Mathematics and Computer Science Formal Methods and Tools Doctor of Philosophy Reliability and Maintenance for Engineering Systems: Fault Trees, Degradation Modelling and Maintenance Optimisation by Lisandro Arturo Jimenez Roa Modern infrastructures, machines, and manufacturing processes require e!ective management through sustainable policies under constrained resources, where determining when and how to intervene becomes crucial. The Prognostics and Health Management (PHM) paradigm provides a systematic framework for leveraging data collection and computational models, supporting the management of virtually any engineering component or system. This dissertation delves into three key aspects of PHM: Reliability Modelling, Markov Process-based Prognostics, and Maintenance Optimisation. Data-driven techniques are crucial in these areas, enhancing the automation of model development and deployment. Part I centres on Reliability Modelling, specifically the automatic inference of Fault Tree (FT) models. Traditionally, graph-based models like FTs are manually constructed through iterative collaboration between system experts and FT modellers. However, this manual approach is prone to human error and may result in incomplete models. With the increasing data availability, methodologies that attempt to automate this process, discover patterns and reduce dependency on manual intervention have gained significant relevance. Thus, in Part I of this dissertation, we focus on how to obtain e!cient and compact Fault Tree models from failure datasets in a robust and scalable manner. For this, we propose for the first time, using Multi-Objective Evolutionary Algorithms (MOEAs) to automatically infer FT models and frame the optimisation as a multi-objective task. This resulted in the FT-MOEA algorithm (Chapter 2), focusing on three optimisation metrics, including FT size and accuracy-related metrics. FT-MOEA consistently produced compact FT structures, but faced scalability issues. To address this, we developed the SymLearn toolchain (Chapter 3), which uses a ‘divide-and-conquer’ approach by identifying modules and symmetries in the

failure dataset, breaking the inference problem into smaller tasks. Additionally, to improve robustness and scalability, the FT-MOEA-CM extension (Chapter 4) includes additional metrics from the confusion matrix. Our approaches in Part I contribute to automating the construction of FT models, revealing compact structures, which can help uncover relationships between basic and intermediate events, providing valuable patterns for asset managers to improve reliability modelling. Part II focuses on Markov Process-based Prognostics, specifically to model the stochastic deterioration of sewer mains. Sewer systems are critical to social welfare but pose significant challenges due to their extensive scale and limited capacity to monitor the entire network. Accurate modelling of the deterioration profile is crucial for optimising inspections and maintenance, thereby enhancing the reliability and availability of the system. Di!erent types of deterioration models are discussed in the literature, ranging from physics-based to data-driven approaches, each with distinct advantages and limitations. In Part II of this dissertation, we address how and to what extent it is possible to accurately model Multi-State Deterioration with applications in sewer mains. For this, we focus on Markov chains, widely used to model stochastic sequences through states and transitions. Since the 1990s, they have been applied to represent damage severity levels in sewer mains using inspection data from Closed Circuit Television cameras. Nonetheless, further evaluation of their assumptions and properties is required. We present a case study of a Dutch sewer network (Chapter 5), starting with Discrete-Time Markov Chains for deterioration modelling and examining two Markov chain structures. Given challenges in the data such as interval-censoring, advanced analysis was necessary, for this in Chapter 6, we implement the Turnbull estimator for non-parametric analysis to establish a ground truth. Although both homogeneous and inhomogeneous-time Markov chains are employed for sewer mains deterioration, no prior studies have compared their performance on the same dataset. Chapter 6 addresses this by demonstrating that inhomogeneous-time Markov chains are more versatile at capturing non-linear stochastic behaviour, while also highlighting issues like overfitting that reduce predictive accuracy. Part II provides a real-world case study, emphasising the need to critically evaluate modelling assumptions to enhance deterioration modelling of sewer mains using Markov chains. Finally, Part III focuses on Maintenance Optimisation of sewer mains, where obtaining optimal maintenance policies for such components is a complex task. This complexity arises, among others, from the system’s scale, availability of adequate data, and simplifications in the deterioration model. Among the techniques available, Reinforcement Learning (RL) remain largely unexplored for devising strategic maintenance actions in sewer mains. Thus, in Part III of this dissertation, we focus on how to devise near-optimal maintenance strategies for components with Multi-State Deterioration such as sewer mains using Deep Reinforcement Learning.

In Chapter 7, we frame the sequential decision-making problem using Deep Reinforcement Learning (DRL) for component-level maintenance of sewer mains. This framework considers damage severity levels, testing di!erent deterioration model assumptions and evaluating their impact on maintenance policy. Our results show that agent-based policies outperformed heuristics by learning optimal sequences of maintenance actions. Part III provides evidence that DRL-based techniques o!er a flexible framework with the potential to improve heuristics and support maintenance decision-making for sewer mains. However, training these models to achieve the desired behaviour remains a challenging task.

UNIVERSITEIT TWENTE Samenvatting Faculteit Elektrotechniek, Wiskunde en Informatica, Afdeling Informatica Formal Methods and Tools Doctor of Philosophy Betrouwbaarheid en Onderhoud van Technische Systemen: Fault Trees, Degradatiemodellering en Onderhoudsoptimalisatie door Lisandro Arturo Jimenez Roa Moderne infrastructuren, machines en productieprocessen vereisen e!ectief beheer door duurzame beleidsmaatregelen onder beperkte middelen, waarbij het bepalen van wanneer en hoe in te grijpen cruciaal wordt. Het Prognostics and Health Management (PHM)-paradigma biedt een systematisch kader voor het benutten van gegevensverzameling en computationele modellen, ter ondersteuning van het beheer van vrijwel elk technisch onderdeel of systeem. Deze dissertatie verdiept zich in drie sleutelaspecten van PHM: Betrouwbaarheidsmodellering, Prognoses Gebaseerd op Markov-Processen, en Onderhoudsoptimalisatie. Datagestuurde technieken zijn cruciaal in deze gebieden en verbeteren de automatisering van modelontwikkeling en -implementatie. Deel I richt zich op Betrouwbaarheidsmodellering, specifiek de automatische inferentie van Fault Tree (FT)-modellen. Traditioneel worden grafisch gebaseerde modellen zoals FTs handmatig geconstrueerd door iteratieve samenwerking tussen systeemexperts en FTs-modelleurs. Echter, deze handmatige aanpak is gevoelig voor menselijke fouten en kan resulteren in onvolledige modellen. Met de toenemende beschikbaarheid van gegevens hebben methoden die dit proces automatiseren, patronen ontdekken en afhankelijkheid van manuele interventie verminderen, aanzienlijke relevantie gekregen. In deel I van deze dissertatie richten we ons op hoe e!ciënte en compacte Fault Tree modellen uit foutendatasets te verkrijgen op een robuuste en schaalbare manier. Hiervoor stellen we, voor de eerste keer, voor om Multi-Objective Evolutionary Algorithms (MOEAs) te gebruiken om automatisch FT-modellen te infereren en de optimalisatie als een multi-objectieve taak te beschouwen. Dit resulteerde in het FT-MOEA-algoritme (Hoofdstuk 2), dat zich richt op drie optimalisatiemetrieken, inclusief FT-grootte en aan nauwkeurigheid gerelateerde metrieken. FT-MOEA

produceerde consistent compacte FT-structuren, maar kende schaalbaarheidsproblemen. Om dit aan te pakken, hebben we de SymLearn-toolchain ontwikkeld (Hoofdstuk 3), die een ‘verdeel-en-heers’-benadering gebruikt door modules en symmetrieën in de faalgegevensset te identificeren, en het inferentieprobleem opdeelt in kleinere taken. Bovendien, om de robuustheid en schaalbaarheid te verbeteren, bevat de FT-MOEA-CM-uitbreiding (Hoofdstuk 4) extra metrieken uit de verwarringsmatrix. Onze benaderingen in Deel I van deze dissertatie dragen bij aan het automatiseren van de constructie van FT-modellen, door compacte structuren te onthullen. Deze consistente structuren kunnen helpen relaties tussen basis- en tussengebeurtenissen te onthullen, en bieden waardevolle inzichten voor vermogensbeheerders om betrouwbaarheidsmodellering te verbeteren. Deel II richt zich op Prognoses Gebaseerd op Markov-Processen, specifiek de stochastische degradatiemodellering van rioolhoofdleidingen. Riolen zijn essentieel voor het sociale welzijn maar vormen significante uitdagingen vanwege hun uitgebreide schaal, langzame degradatie en beperkte capaciteit om het gehele netwerk te monitoren. Het nauwkeurig modelleren van het degradatieprofiel is cruciaal voor het optimaliseren van inspecties en onderhoud, en verbetert daarmee de betrouwbaarheid en beschikbaarheid van de netwerken. Verschillende degradatiemodellen worden besproken in de literatuur, variërend van op fysica gebaseerde tot datagedreven benaderingen, elk met eigen voor- en nadelen. In Deel II van deze dissertatie behandelen we hoe en in welke mate het mogelijk is om Multi-State Deterioration nauwkeurig te modelleren met toepassingen in rioolhoofdleidingen. Voor dit doel richten we ons op Markov-ketens, die veel worden gebruikt om stochastische sequenties te modelleren door middel van staten en overgangen. Sinds de jaren ’90 zijn deze toegepast om schadeniveaus in rioolhoofdleidingen te representeren met behulp van inspectiegegevens van Closed Circuit Television-camera’s. Desondanks is verdere evaluatie van hun aannames en eigenschappen vereist. We presenteren een casestudie van een Nederlands rioolnetwerk (Hoofdstuk 5), beginnend met Discrete-Time Markov Chains voor degradatiemodellering en het onderzoeken van twee Markov-ketenstructuren. Gezien uitdagingen zoals intervalgecensureerde gegevens, was een meer geavanceerde analyse nodig. In Hoofdstuk 6 gebruiken we de Turnbull-schatter voor niet-parametrische analyse om een grondwaarheid vast te stellen. Hoewel zowel homogene als inhomogene tijd-Markov-ketens worden ingezet voor de degradatiemodellering van rioolhoofdleidingen, zijn eerdere studies die hun prestaties op dezelfde dataset hebben vergeleken, nog niet uitgevoerd. Hoofdstuk 6 adresseert dit door aan te tonen dat inhomogene-tijd-Markov-ketens veelzijdiger zijn bij het vastleggen van niet-lineair stochastisch gedrag, waarbij ook problemen zoals overfitting worden belicht die de voorspellende nauwkeurigheid verminderen. Deel II biedt een praktijkvoorbeeld en benadrukt het belang van het kritisch evalueren van modelaannames voor verbeterde degradatiemodellering van rioolhoofdleidingen met Markov-ketens. Ten slotte richt Deel III zich op Onderhoudsoptimalisatie van rioolhoofdleidingen,

waarbij het verkrijgen van optimaal onderhoudsbeleid voor dergelijke grootschalige systemen een complexe taak is. Deze complexiteit ontstaat, onder andere, uit de schaal van het systeem en vereenvoudigingen in het degradatiemodel. Onder de verschillende beschikbare technieken blijven Reinforcement Learning (RL)- benaderingen grotendeels onverkend voor het bedenken van strategische onderhoudsacties in rioolhoofdleidingen. Derhalve richten we ons in Deel III van deze dissertatie op hoe bijna-optimale onderhoudsstrategieën te ontwikkelen voor componenten met Multi-State Deterioration zoals rioolhoofdleidingen door gebruik te maken van Deep Reinforcement Learning. In Hoofdstuk 7 kaderen we het sequentiële besluitvormingsprobleem met gebruik van Deep Reinforcement Learning (DRL) voor componentniveau onderhoud van rioolhoofdleidingen. Dit kader beschouwt schadeniveaus, test verschillende aannames van het degradatiemodel en evalueert hun impact op het onderhoudsbeleid. Onze resultaten tonen aan dat op agenten gebaseerde beleidsmaatregelen heuristieken overtro!en door optimale onderhoudsactiesequenties te leren. Deel III biedt bewijs dat DRL-gebaseerde technieken een flexibel kader bieden met het potentieel om heuristieken te verbeteren en ondersteuning te bieden bij onderhoudsbesluitvorming voor rioolhoofdleidingen. Echter, het trainen van deze modellen om het gewenste gedrag te bereiken blijft een uitdagende taak.

Acknowledgements This doctorate is the result of a long journey during which I had the opportunity to meet incredible people and visit amazing places. Here, I would like to express my gratitude to those who have been part of this beautiful journey, starting with my close-knit family. Quiero dar un agradecimiento muy especial a mi madre, Luz Hatty Roa (mi luz), quien desde pequeño me ha impulsado y brindado su apoyo incondicional. A mi padre, Lisardo A. Jimenez, quien me ha enseñado la tenacidad y la fuerza, que ante la vida uno no se rinde, sin importar qué; también, un agradecimiento especial a mis hermanos, Yennifer Jimenez Roa y Roger A. Jimenez Roa; a mi sobrinito, Nicola Pasqualin Jimenez; y a mis cuñados, Lorena M. Montoya y Edwin Plaza. Ik wil ook mijn dankbaarheid uiten aan mijn uitgebreide familie, in het bijzonder aan mijn partner Niels van Huizen voor zijn onvoorwaardelijke steun en het samen delen van deze prachtige fase in ons leven; en aan Patrick, Helma, en Zanne van Huizen voor alle mooie momenten. To my alma mater, Universidad del Valle in my home city Cali-Colombia, of which I am immensely proud and to whom I hope to one day give back as much as it has given me. I am particularly thankful to Professors Ricardo Ramírez, Patricia Guerrero, and Doris Hinestroza, who played pivotal roles in my career as a Civil Engineer. I also want to thank João S. Almeida, Angelica Rosso, and Katrin Beyer for the enriching experience at École Polytechnique Fédérale de Lausanne in Lausanne-Switzerland. My gratitude also extends to my friends Laura Villegas (mi flaquita) and Ra!ael Tschui (meme) for their friendship despite the long distance. To the “Pelicanoes” Andrea Vargas, Leopoldo Ríos, Lara Wöhler, Norwing Thunhorst, Daria Nemashkalo, Oleksandr Mialyk, Kees Lieverse, Daniel Foley, and Sara Dionisio Antonio for the amazing and fun times we shared, and the ones to come. To the University of Twente, my other alma mater, for opening its doors and supporting my career. I particularly want to thank the Construction Management and Engineering group for hosting my Engineering Doctorate and allowing me to meet remarkable people, especially Monik Peña Acosta (Colombiana), TomCoenen, Angie Ruiz, Ramon ter Huurne, Timvan Ee, Ruth Sloot, Roland Kromanis, and Franziska Baack. I want to thank the Formal Methods and Tools group for making me feel at home during my PhD project and my adventure in Computer Science, special thanks to Yeray Barrios, Stefano Nicoletti, Reza Soltani, Milan Lopuhaä, Edoardo Putti, (not mathematician) Yanni Dong, Lukas Armborst,

Arnd Hartmanns, Arend Rensink, Moritz Hahn, Djurre van der Wal, Tannaz Zameni, Marcus Gerhold, Nhat bui, Marijn Peppelman, Bram Kohlen, Bob Rubbens, Rom Langerak, Petra Bos, Vadim Zaytsev, Georgiana Caltais, Peter Lammich, Faizan Ahmed, Tom van Dijk, Marion Steenbergen-Boeringa and Marieke Huisman. I want to thank the Laboratory of Analysis of Systems for the Assessment of Reliability (LASAR) for hosting my research visit at the Politecnico di Milano in Milan-Italy, where, despite the short time I spent there, I got to meet amazing people. Particular thanks to Stefano Marchetti, Valentina Clavijo, Giovanni Floreale, Giovanni Roma, Thomas Coscia, Nicolas Cárdenas, Giselle Tachoires, Sara Castellani, Federico Bassi, Piero Baraldi, and Enrico Zio. To PrimaVera for allowing me to be part of such an ambitious and challenging project, which allowed me to learn and connect with like-minded, passionate researchers. Special thanks to Matthias Völk, Thiago D. Simão, Luc Keizers, Nubia Alves de Silveira, Natália Marinho, Thom Badings, Ragnar Eggertsson, Hajo Molegraaf, Zaharah Bukhsh, Nils Jansen, Johan van den Bogaard, and Juseong Lee. Finally, special thanks to my PhD supervisors, Mariëlle Stoelinga, Tiedo Tinga, andTomHeskes, for their support and guidance throughout my four-year research journey. I deeply appreciate their lessons, advice, commitment, and patience. Lisandro Arturo Jimenez Roa Enschede, 19 December, 2024

Contents Abstract vii Samenvatting xi Acknowledgements xv 1 Introduction 1 1.1 Research context and motivation . . . . . . . . . . . . . . . . . . . 1 1.2 Mainconcepts ............................. 2 1.2.1 Prognostics and Health Management . . . . . . . . . . . . . 2 1.2.2 ReliabilityModelling. . . . . . . . . . . . . . . . . . . . . . 4 1.2.3 Markov Process-based Prognostics . . . . . . . . . . . . . . 8 1.2.4 Maintenance Optimisation . . . . . . . . . . . . . . . . . . . 12 1.3 Researchgaps.............................. 16 1.4 Researchquestions ........................... 19 1.5 Researchmethodology ......................... 20 1.6 Thesisoutline.............................. 21 1.7 Maincontributions........................... 21 1.8 Listofpublications........................... 24 1.9 References................................ 25 Part I: Data-driven Inference of Fault Tree models 33 I.1 Introduction............................... 33 I.2 Nomenclature .............................. 34 I.3 Relatedwork .............................. 34 I.4 Preliminaries .............................. 36 I.4.1 FaultTrees............................ 36 I.4.2 Failuredataset ......................... 37 I.4.3 Inference of Fault Tree models . . . . . . . . . . . . . . . . . 37 I.5 References................................ 38 2 Automatic Inference of Fault Tree Models via Multi-Objective Evolutionary Algorithms 41 Abstract................................. 41

2.1 Introduction............................... 42 2.2 FaultTreeAnalysis........................... 43 2.3 Multi-Objective Evolutionary Algorithms . . . . . . . . . . . . . . 44 2.3.1 Elitist Non-dominated Sorting Genetic Algorithms . . . . . 44 2.3.2 Crowding-Distance....................... 45 2.4 Methodology .............................. 46 2.5 Thefailuredataset ........................... 46 2.6 Inferring Fault Trees via Multi-Objective Evolutionary Algorithms (FT-MOEA) ................................ 47 2.6.1 Step1-Initialisation. . . . . . . . . . . . . . . . . . . . . . 47 2.6.2 Step 1.2 - Extraction of MCSs from the failure dataset (optionalstep) ........................... 48 2.6.3 Step2-ParentFaultTrees . . . . . . . . . . . . . . . . . . 49 2.6.4 Step3-Geneticoperators. . . . . . . . . . . . . . . . . . . 49 2.6.5 Step 4 - Multi-objective function . . . . . . . . . . . . . . . 50 2.6.6 Step 5 - Convergence criteria . . . . . . . . . . . . . . . . . 51 2.7 Experimentalevaluation........................ 51 2.7.1 TheMonteCarlomethod . . . . . . . . . . . . . . . . . . . 51 2.7.2 Casestudies........................... 52 2.7.3 Key findings of the FT-MOEA algorithm............ 52 2.7.4 Parametricanalysis ...................... 55 2.8 Discussionandconclusions. . . . . . . . . . . . . . . . . . . . . . . 57 2.9 References................................ 58 3 Data-Driven Inference of Fault Tree Models Exploiting Symmetry and Modularisation 61 Abstract................................. 61 3.1 Introduction............................... 61 3.2 Modulesandsymmetries........................ 63 3.2.1 Modules............................. 63 3.2.2 Symmetries ........................... 64 3.3 Exploiting modules and symmetries in Fault Tree inference . . . . 65 3.4 Experimentalevaluation........................ 69 3.5 Conclusions ............................... 72 3.6 References................................ 73 4 Fault Tree inference using Multi-Objective Evolutionary Algorithms and Confusion Matrix-based metrics 75 Abstract................................. 75 4.1 Introduction............................... 76 4.2 Confusion Matrix-based metrics . . . . . . . . . . . . . . . . . . . . 77 4.3 FT-MOEA-CM’smethodology ...................... 78 4.4 Experimentalevaluation........................ 79 4.5 Results.................................. 80

4.5.1 Featureassessment....................... 80 4.5.2 Comparing FT-MOEA and FT-MOEA-CM ............ 82 4.5.3 FT-MOEA-CM’s Features: parallelisation and caching . . . . . 86 4.6 Generaldiscussion ........................... 87 4.7 Conclusions ............................... 89 4.8 References................................ 91 Part II: Multi-state deterioration modelling 93 II.1 Introduction .............................. 93 II.2 Nomenclature ............................. 94 II.3 Relatedwork.............................. 94 II.4 Preliminaries .............................. 95 II.4.1 Markovchains ......................... 95 II.4.2 Case studies in sewer networks: overview . . . . . . . . . . 99 II.4.3 Breda’scasestudy....................... 100 II.4.4 Deterioration modelling in sewer mains using Markov Chains 104 II.5 References ............................... 106 5 Deterioration Modelling of Sewer Pipes via Discrete-Time Markov Chains: A Large-Scale Case Study in the Netherlands 111 Abstract................................. 111 5.1 Introduction............................... 112 5.2 Discrete-Time Markov Chains . . . . . . . . . . . . . . . . . . . . . 113 5.3 Methodology .............................. 115 5.3.1 Step 1: Data pre-processing . . . . . . . . . . . . . . . . . . 115 5.3.2 Step 2: Definition of cohorts . . . . . . . . . . . . . . . . . 115 5.3.3 Step3: Discretisedtable. . . . . . . . . . . . . . . . . . . . 115 5.3.4 Step 4: Calibration of the Discrete-Time Markov Chain . . 116 5.4 Experimentalevaluation........................ 117 5.4.1 Casestudy ........................... 117 5.4.2 Results ............................. 117 Comparing Cohorts CMW and CR ................ 118 Comparing Cohorts PMW and PR ................ 119 Comparing Cohorts CdL and CdG ............... 119 Comparing Chains “Multi” and “Single” . . . . . . . . . . . 119 5.5 Discussion, conclusions and future work . . . . . . . . . . . . . . . 119 5.6 References................................ 120 6 Comparing Homogeneous and Inhomogeneous Time Markov Chains for Modelling Deterioration in Sewer Pipe Networks 123 Abstract................................. 123 6.1 Introduction............................... 124 6.2 Methodsandmaterials......................... 125

6.2.1 Multi-state deterioration modelling for sewer networks using parametrised Markov chains . . . . . . . . . . . . . . . . . . 126 6.2.2 Modelcalibration........................ 127 6.2.3 Non-parametric modelling . . . . . . . . . . . . . . . . . . . 129 6.2.4 Goodness-of-fit metrics . . . . . . . . . . . . . . . . . . . . . 130 6.3 Experimental setup and evaluation . . . . . . . . . . . . . . . . . . 130 6.3.1 Casestudy ........................... 130 6.3.2 Experimentalsetup....................... 130 6.3.3 Results ............................. 131 6.4 Findings................................. 132 6.4.1 Comparison between cohorts . . . . . . . . . . . . . . . . . 132 6.4.2 Transition probabilities over time . . . . . . . . . . . . . . . 132 6.4.3 Overfitting ........................... 132 6.4.4 Comparing inhomogeneous Markov chains . . . . . . . . . . 132 6.5 Conclusions and future research . . . . . . . . . . . . . . . . . . . . 134 6.6 References................................ 135 Part III: Maintenance optimisation of multi-state components 138 III.1 Introduction.............................. 139 III.2 Nomenclature ............................. 140 III.3 Relatedwork ............................. 141 III.4 Preliminaries ............................. 145 III.4.1 Markov Decision Process . . . . . . . . . . . . . . . . . . . 145 III.4.2 Deep Reinforcement Learning . . . . . . . . . . . . . . . . 145 III.4.3 Contextual Markov Decision Process . . . . . . . . . . . . 147 III.4.4 Proximal Policy Optimisation . . . . . . . . . . . . . . . . 148 7 Maintenance Strategies for Sewer Pipes with Multi-State Deterioration and Deep Reinforcement Learning 153 Abstract................................. 153 7.1 Introduction............................... 154 7.2 Technicalbackground ......................... 155 7.3 Methodology .............................. 155 7.4 Multi-state deterioration models . . . . . . . . . . . . . . . . . . . 156 7.4.1 Casestudy ........................... 156 7.4.2 Parametrisation......................... 156 7.4.3 Solving the Multi-State Deterioration Model . . . . . . . . 157 7.4.4 Parametric Multi-State Deterioration Models . . . . . . . . 157 7.5 Definition of Markov Decision Process for Maintenance Policy Optimisation of a sewer main considering deterioration over the pipe length .................................. 158 7.5.1 State space S .......................... 159

Health vector (h)........................ 159 Stochastic prediction of severity levels . . . . . . . . . . . . 160 7.5.2 Action space A ......................... 161 7.5.3 Transition probability function P ............... 161 7.5.4 Reward function R....................... 162 7.6 Experimentalsetup........................... 163 7.6.1 Setup .............................. 163 7.6.2 Comparison of maintenance strategies . . . . . . . . . . . . 164 7.7 Results.................................. 164 7.7.1 Implementation and hyper-parameter tuning . . . . . . . . 164 7.7.2 Policy analysis: overview . . . . . . . . . . . . . . . . . . . 165 7.7.3 Policy analysis over episode . . . . . . . . . . . . . . . . . . 168 7.8 Discussionandconclusions. . . . . . . . . . . . . . . . . . . . . . . 171 7.9 References................................ 172 Discussion, Conclusions & Recommendations 175 8 Discussion 177 8.1 Reliability Modelling: Data-driven Inference of Fault Tree models . 177 8.2 Markov Process-based Prognostics: Multi-state deterioration modelling179 8.3 Maintenance Optimisation: Maintenance optimisation of multi-state components ............................... 182 8.4 Moving towards comprehensive Prognostics and Health Management: ClosingThoughts............................ 184 8.5 References................................ 186 9 Conclusions & Recommendations 187 9.1 Conclusions ............................... 187 9.2 Recommendations ........................... 188 9.2.1 Automatic Inference of Fault Tree Models . . . . . . . . . . 188 9.2.2 Multi-State Deterioration Modelling of Sewer Mains . . . . 189 9.2.3 Strategic Maintenance Planning for Sewer Mains using ReinforcementLearning....................... 191 9.3 References................................ 192 Appendices 195 A Appendix: Introduction 197 A.1 Example of a Multi-State Deterioration model with two states . . . 197 A.2 Example of a Multi-State Deterioration model with three states . . 200 B Appendixes: FT-MOEA 203 B.1 Data-driven methods to infer FTs from data . . . . . . . . . . . . . 203

B.2 Applying NSGA-II and Crowding-Distance to infer FTs . . . . . . 203 B.2.1 Applying NSGA-II to infer FTs . . . . . . . . . . . . . . . . 203 B.2.2 Crowding-Distance....................... 207 B.3 ExampleofinferredFaultTrees. . . . . . . . . . . . . . . . . . . . 207 B.4 Details of convergence of metrics over generations . . . . . . . . . . 208 B.5 Varying parent Fault Trees (details) . . . . . . . . . . . . . . . . . 211 B.6 Comparing the performance of m.o.f.’s for the case studies CSD, PT, andSMS’ ................................ 212 B.7 Noise E!ects in the inference of Fault Trees with FT-MOEA . . . . . 212 B.8 References................................ 213 C Appendix: Multi-State Deterioration 217 C.1 Relations in reliability analysis . . . . . . . . . . . . . . . . . . . . 217 D Parameters of Multi-State Degradation models 219 ———————————————————–

List of Abbreviations AI Artificial Intelligence. 9, 15 BE Basic Event. 34, 55, 57, 58, 88–90, 179 CBM Condition-Based Maintenance. 2 CCTV Closed Circuit Television. viii, xii, 11, 18, 94, 99, 101, 125 CM Confusion Matrix. 76–78, 80 CMDP Contextual Markov Decision Process. 140, 147, 191 CTMC Continuous-Time Markov Chain. 98, 126, 134 DL Deep Learning. 14 DNN Deep Neural Network. 15, 140, 145–147 DRL Deep Reinforcement Learning. viii, ix, xiii, 2, 15– 17, 19–21, 23, 139, 143–145, 147, 148, 153–156, 159, 167, 171, 177, 183–185, 188, 191, 192 DTMC Discrete-Time Markov Chain. viii, xix, 20, 98, 99, 105, 111–117, 119, 120, 126, 132, 134, 190 EA Evolutionary Algorithm. 44 FT Fault Tree. vii, viii, xi, xii, xxi, xxii, 2, 5, 6, 14, 16–21, 23, 33–38, 41–58, 61–66, 68, 69, 72, 73, 75–85, 87–91, 177–179, 184, 187–189, 203, 205–207, 211–214 FTA Fault Tree Analysis. 5, 17, 41–43, 61, 62, 76, 177 HR Hazard Rate. 97 HTMC Homogeneous Time Markov Chain. 125, 126 IHTMC Inhomogeneous Time Markov Chain. 97, 98, 123, 125, 126, 134, 154, 190

MC Markov Chain. 112 MCS Minimal Cut Set. xviii, 14, 34, 37, 41, 43, 44, 47–52, 56–58, 63–69, 71, 72, 76, 83, 84, 87–90, 179 MDP Markov Decision Process. 2, 14–16, 19, 21, 140– 142, 145, 147, 158, 188, 191, 192 ML Machine Learning. 9, 12–15, 18, 19, 77, 94, 95, 99, 112, 141, 143, 154 MO Maintenance Optimisation. 12–14 MOEA Multi-Objective Evolutionary Algorithm. vii, xi, 6, 7, 21, 23, 34, 43, 44, 47, 57, 177–179, 184, 187, 189 MPO Maintenance Policy Optimisation. 13, 19, 139, 142, 144, 154, 183 MSD Multi-State Deterioration. viii, xii, xiii, xx–xxii, 2, 10, 11, 17, 19, 20, 23, 93, 139, 154, 155, 157, 179, 187, 188, 197, 199–201, 217, 218 MSDM Multi-State Deterioration Model. xx, 23, 140, 153, 155, 157, 158, 160, 161, 163, 164, 179, 180, 182–185, 219 MSM Multi-State Modelling. 95, 125 NN Neural Network. 16 PC Principal Component. 81, 82 PCA Principal Component Analysis. 77, 80, 81, 86 PDF Probability Density Function. 97 PdM Predictive Maintenance. 2 PHM Prognostics and Health Management. vii, xi, 1–4, 8, 9, 16, 153, 171, 177, 179, 184, 185 PoF Physics of Failure. 5 PPO Proximal Policy Optimisation. 140, 147–149, 155, 156, 164, 171, 188 RL Reinforcement Learning. viii, xiii, 14, 15, 19, 140, 141, 144–146, 153, 154, 156, 158, 164, 172, 183, 185, 191 RUL Remaining Useful Life. 8 SF Survival Function. 97

1 Chapter 1 Introduction 1.1 Research context and motivation Modern societies rely on a variety of engineering systems that help maintain order, ensure safety, provide comfort, as well as promote growth and well-being. Engineering systems are specifically designed, built, and managed to address challenges by combining di!erent technological parts. They cover many areas such as mechanical, electrical, civil, and computer engineering, and are crucial in building and maintaining our infrastructure, producing goods, managing transportation, and handling information technology. These systems are usually complex and have many interconnected components that depend on each other. Careful management and continuous improvement are key for them to work as intended. According to fundamental principles of physics, everything naturally progresses towards a state of wear and breakdown through deterioration (McPherson, 2019), encompassing processes or events that a!ect the functionality of system components or the system as a whole. Keeping engineering systems operational and available under limited resources highlights the importance of approaches such as Prognostics and Health Management (PHM), which will be later explained. PHM is gaining popularity in various industries because it improves sustainability by considering environmental, social, and economic factors, aiming to use resources e"ciently and positively impact the environment. The main goal of PHM is to use models and data to spot unusual behaviours and problems, diagnose issues, and anticipate future performance. This aids in developing optimal strategies for e!ective system management. Within the context of the PrimaVera project (https://primavera-project.com/), this work investigates key aspects of PHM through the lenses of ‘Reliability Modelling’, ‘Markov Process-based Prognostics’, and ‘Maintenance Optimisation’. Before addressing the research gaps detailed in Section 1.3, the next section provides an overview of the relevant literature and the background of the primary methods, models, and algorithms employed in this dissertation.

2 Chapter 1. Introduction 1.2 Main concepts This section provides a concise overview of the main concepts and relevant literature for this dissertation, acknowledging that some details are beyond its scope. Prognostics and Health Management (PHM), as described in Section 1.2.1, serves as the overarching concept linking the research directions explored in this dissertation. Figure 1.1 illustrates the main concepts discussed and their relationships. Prognostics and Health Management (PHM) Maintenance Optimisation Markov process-based Prognostics (Section 1.2.1) (Section 1.2.3) (Section 1.2.4) Sewer Deterioration Modelling Markov Chains for Multi-State Deterioration Reliability Modelling Markov Decision Process Deep Reinforcement Learning Fault Tree Analysis Multi-Objective Evolutionary Algorithms (Section 1.2.2) Figure 1.1: Main theoretical concepts used in this dissertation. Section 1.2.2 introduces Reliability Modelling, focusing on Fault Tree Analysis and the use of Multi-Objective Evolutionary Algorithms to automatically infer the structure of Fault Tree models. Section 1.2.3 introduces Markov Process-based Prognostics, with emphasis on Markov chains for Multi-State Deterioration with applications on Sewer Deterioration Modelling. Finally, Section 1.2.4 introduces Maintenance Optimisation with applications to sewer mains, where we use Markov Decision Process to cast the optimisation problem and approximate near-optimal maintenance policies via Deep Reinforcement Learning. 1.2.1 Prognostics and Health Management Prognostics and Health Management (PHM) is a holistic multi-disciplinary computation-based engineering paradigm that employs measurements, models, and software for the e!cient and comprehensive management of system health (Sheppard, Kaufman, and Wilmering, 2008; Lee, F. Wu, Zhao, et al., 2014; W. Zhang, D. Yang, and H. Wang, 2019; Fink, Q. Wang, Svensén, et al., 2020; Hu, Miao, Y. Si, et al., 2022; L. Zhang, Lin, Liu, et al., 2019; Zio, 2022). Notice that in the literature, the terms Predictive Maintenance (PdM) and Condition-Based Maintenance (CBM) greatly overlap with PHM. However, the use of PdM and CBM may be ambiguous and, in some cases, interchangeable (Esteban, Zafra, and Ventura, 2022). For consistency, in this dissertation, we adopt PHM as a more general term.

1 1.2. Main concepts 3 PHM 3. Diagnostics 4. Prognostics 5. Maintenance Optimisation & logistics 6. Decision-making and policy deployment 2. Data & Information collection Reliability Modelling Markov process-based Prognostics 1. System/ Component/ Process (Section 1.2.3) (Section 1.2.2) Maintenance Optimisation (Section 1.2.4) Figure 1.2: The main stages of the PHM paradigm. The main stages in PHM are depicted in Figure 1.2. Stage 1 identifies the system, component, or process of interest. Stage 2 involves the design and implementation of infrastructure that enables the collection of data and relevant information, such as monitoring systems. Stage 3, known as diagnostics, addresses what is wrong? and focuses on the current system condition. Here, models detect, isolate, locate, quantify, and classify anomalies and failure modes. Stage 4, known as prognostics, answers how long until an event or state is reached? and focuses on the future condition of the system. Here, models attempt to characterise the system’s future performance. Stage 5, referred to as maintenance optimisation & logistics, focuses on the set of algorithms that seek the optimal set of actions to control andmeet functionality requirements. Lastly, Stage 6 focuses on the decision-making and deployment of the controlling policy. Once Stage 6 is reached, the cycle restarts, even with new goals given former stages, e.g., evaluating the e!ectiveness of a maintenance action. The arrows in Figure 1.2 refer to the logical progression between stages rather than a strict sequence of steps. For instance, a non-nominal behaviour identified in Stage 3 may prompt a maintenance action (Stage 5), such as performing an inspection, thus skipping Stage 4—Prognostics. Additionally, PHM can serve as a design tool. For instance, the requirements for employing a specific type of prognostic model in Stage 4 help determine part of the data to be collected in Stage 2.

4 Chapter 1. Introduction As mentioned earlier, this dissertation focuses on relevant aspects within PHM. Figure 1.2 highlights that Reliability Modelling is mainly associated with Stages 1 and 2; Markov Process-based Prognostics with Stage 4, and Maintenance Optimisation with Stage 5. 1.2.2 Reliability Modelling Reliability is the “ability to perform as required, without failure, for a given time interval, under given conditions” (IEC 60050:192-01-24), where “failure is the loss of the ability to perform as required” (IEC 60050:192-03-01). Reliability modelling is the process of developing mathematical models that encapsulate reliability functions and dependencies within a system. The former quantifies the probability that a system or component will perform without failure over a specified period under defined conditions. The latter details the interactions between components that influence the overall system reliability (Assaf, 2018). Reliability modelling provides stochastic-based outcomes (e.g., probabilities) useful for reliability assessment to enhance a system’s lifespan, scheduling maintenance appropriately, and reducing the risk of failures (O’Connor and Kleyner, 2012). The application of reliability modelling spans various industries such as nuclear, aerospace, automotive, electronics, and manufacturing, where reliability is a critical factor (Modarres, Kaminskiy, and Krivtsov, 2016). Reliability (t) Bearing age (t) Failure Reliability over time Nominal behaviour Figure 1.3: Reliability function of a bearing (example). Figure 1.3 exemplifies the reliability function of a bearing as a function of its age. Initially, a new bearing has high reliability, representing nominal behaviour. As the bearing ages and wear occurs, reliability decreases due to the e!ects of deterioration. This means that the older (or more used) the bearing is, the less likely it is to perform nominally. Alternatively, the older the bearing is, the more likely it is to fail. Several models and methods are used in reliability modelling. Statistical-based reliability models, which can be based on Exponential, Weibull, and Log-normal distributions, utilise historical failure data to estimate the probability of failure. Nelson, 2005 provides a comprehensive guide on how these models are used to understand di!erent failure rates and patterns.

1 1.2. Main concepts 5 The Physics of Failure (PoF) approach examines the root causes of component failures by assessing how materials, defects, and stresses a!ect reliability. It identifies and models individual failure mechanisms in components based on environmental and usage stresses (H. Wang, Liserre, Blaabjerg, et al., 2014). For example, Zhu, Huang, W. Peng, et al., 2016 proposes a PoF-based framework for fatigue reliability analysis of an aircraft turbine engine disc. System reliability models, on the other hand, evaluate the reliability of the entire system by considering the interaction and configuration of system components. Graphbased techniques, such as Reliability Block Diagrams (Signoret and Leroy, 2021), and systematic approaches, such as Failure Modes and E"ects Analysis (Stamatis, 2003), are employed to identify potential failures and their consequences within a system. One of the most prominent system reliability methods we encounter in the literature is Fault Tree Analysis, which we discuss further in the next section. Fault Tree Analysis Fault Tree Analysis (FTA) (Ruijters and Stoelinga, 2015) is a key technique in reliability engineering and risk analysis, used since the 1960s across various sectors such as automotive, aerospace, and nuclear industries (Kabir, 2017). FTA helps in modelling complex systems by illustrating logical relationships, which are crucial for understanding potential system failures, tracing root causes, pinpointing critical components, and computing probabilities of failure at both system and sub-system levels. Fault Trees (FTs) are graphical models composed of logic gates and basic events. See Section I.4.1 for formal definitions. As an example, Figure 1.4(a) illustrates a bike system. Figure 1.4(b) presents the bike system components, such as wheels, handle, chain, disc brake, and cassette. Figure 1.4(c) models the bike’s inability to ride safely using FTs. The system is divided into sub-systems and components until the desired resolution is reached. The top event: bike cannot ride safely represents the event of interest. The logic gates in Figure 1.4(c) determines how a failure propagates based on Boolean logic. For example, if the chain breaks, the bike cannot ride, triggering the top event through the “OR” gate. The “AND” gate models failure when the wheels fail, indicating that all basic events under the gate must activate for the top event to occur. If only the front wheel fails, the bike can still ride with e!ort, but if the disc brake roto also fails, it becomes unsafe. This simple example highlights the value of FT models in illustrating relationships between components and failure propagation, facilitating strategic actions to prevent system-level failures. When basic events in the FT are represented using

6 Chapter 1. Introduction OR (a) Bike system (b) Bike components (c) (Partial) Fault Tree of a bike Bike cannot ride safely Chain breaks Front wheel fails Back wheel fails Wheels fail AND OR Cassette fails Disc brake rotor fails Figure 1.4: Example of a Fault Tree model of a bike. probability density functions, the model can generate quantitative metrics that support decision-making for e!ective system management. One of the main challenges associated with FTs is building the model itself. To address this, in this dissertation, we used Multi-Objective Evolutionary Algorithms, which we discuss in the next section. Multi-Objective Evolutionary Algorithms Set of solutions Non-dominated solutions Market cost Modern computer Pareto-front Old computer Feasible objective space Performance Figure 1.5: Computer market cost vs performance (example). Multi-Objective Evolutionary Algorithms (MOEAs) are population-based search strategies with conflicting objectives to be simultaneously optimised in a multidimensional space (Deb, 2011). To explain their concept, let us consider the following example: assume that older computers have lower performance due to outdated technology and lower market costs, while modern computers have higher performance and higher market costs. Plotting market cost versus performance for a set of computers creates a visualisation like Figure 1.5.

1 1.2. Main concepts 7 This plot shows the feasible objective space, representing trade-o!s between market cost and performance for the set of solutions. By setting objectives to minimise market cost and maximise performance, we identify non-dominated solutions, forming a Pareto front. Solutions on a Pareto front are optimal as they cannot improve in one objective without sacrificing the other (Deb, 2005). MOEAs employ natural selection, where the best individuals are more likely to reproduce and pass on to the next generations (Ojha, Singh, Chakraborty, et al., 2019). The following example illustrates how they work. For functions f1(x)=(x→2) 2 and f2(x)=→(x+2) 2, the minimum and maximum values are at x =2 and x =→2, respectively. However, when optimising both objectives simultaneously, we will obtain a set of non-dominated solutions instead. For this, we use an MOEA called the NSGA-II (Deb, 2005), and the results are shown in Figure 1.6(a)-(b). We observe that the first Pareto front ranges from→2 to +2 for both functions. This means that all solutions in this set are optimal for minimising f1(x) while maximising f2(x). Figure 1.6(c) displays convergence of the algorithm over generations, and Figure 1.6(d) shows x values in the first Pareto front across generations, showing convergence of x between →2 and +2. 0 60 120 180 240 300 Generations 25 0 25 50 75 100 Mean func. value f1(x) f2(x) 0 60 120 180 240 300 Generations 7.5 5.0 2.5 0.0 2.5 5.0 x in 1st Pareto front Variable x (a) (c) (d) (b) f1(x) f1(x) = (x 2)2 Variable x f2(x) f2(x)= (x+2)2 4 2 0 2 4 0 10 20 30 40 50 4 2 0 2 4 50 40 30 20 10 0 Pareto front Figure 1.6: Solving f1(x) = (x→2) 2 and f2(x) =→(x+2) 2 using MOEAs (example). In (a) Pareto front for f1(x); (b) Pareto front for f2(x); (c) convergence; and (d) convergence of solutions in the 1st Pareto front.

8 Chapter 1. Introduction 1.2.3 Markov Process-based Prognostics Prognostics: An overview Prognostics—from the Greek prognostikos—is the cornerstone concept enabling predictability in PHM (Stage 4 in Figure 1.2). Within an engineering context, the science of prognostics attempts to answer: how long will it be until a particular future event or state is reached? (Goebel, Celaya, Sankararaman, et al., 2017). Therefore, the main aim of prognostics is to predict an event or state before its occurrence, making time a critical variable (Lee, F. Wu, Zhao, et al., 2014). Among the most popular outcomes of prognostics is the estimation of the Remaining Useful Life (RUL), which measures the time until failure. However, estimating the time to reach alternative states to failure may be relevant for some applications, as discussed later in this section. This capability makes prognostics key within the PHM paradigm, enabling taking actions before failures occur, thus allowing better planning while minimising reactive costs and downtime (Elattar, Elminir, and Riad, 2016). Engineering applications of prognostics are vast, including rotating machinery (Heng, S. Zhang, Tan, et al., 2009); Li-ion batteries (J. Zhang and Lee, 2011); gas turbines (Tahan, Tsoutsanis, Muhammad, et al., 2017); manufacturing (T. Xia, Dong, Xiao, et al., 2018); aircraft (Che, H. Wang, Fu, et al., 2019); and wind turbines (Rezamand, Kordestani, Carriveau, et al., 2020). Prognostic models operate at both system and component levels (S. Kim, Choi, and N. H. Kim, 2021) and come in various types; however, the literature lacks consensus regarding their classifications (Mrugalska, 2019). Therefore, in this dissertation, we discuss the following categories: physics-based, data-driven, and hybrid. For completeness, the literature also mentions knowledge-based prognostics, though they are significantly less prevalent. Thus, we do not discuss them here. For more information, see Sikorska, Hodkiewicz, and Ma, 2011; J. Peng, G. Xia, Y. Li, et al., 2022; Xue, J. Yang, M. Yang, et al., 2023. Physics-based prognostics—also referred to as model-based (Zio, 2022; Xue, J. Yang, M. Yang, et al., 2023)—use explicit mathematical representations to formalise physical failure modes and degradation phenomena. This requires a deep understanding of the system’s physics, operating conditions, and life cycle loads (Elattar, Elminir, and Riad, 2016; Javed, Gouriveau, and Zerhouni, 2017; T. Xia, Dong, Xiao, et al., 2018). The process generally involves model identification, simulations under loads, tracking degradation measures, and predicting RUL (Cubillo, Perinpanayagam, and Esperon-Miguez, 2016). These models are tailored to specific applications, such as crack growth, spall progression, and wear, relying on accurate parameterisation using laboratory or real-time data (Rezamand, Kordestani, Carriveau, et al., 2020; D. An, N. H. Kim, and Choi, 2015). Challenges in physics-based prognostics stem from the complexity

RkJQdWJsaXNoZXIy MjY0ODMw