668430-Roa

156 Chapter 7. Maintenance Strategies for Sewer Pipes with Multi-State Deterioration and Deep Reinforcement Learning is beyond the scope of this chapter; for details, see Jimenez-Roa, Heskes, Tinga, et al., 2022; Jimenez-Roa, Tinga, Heskes, et al., 2024. The results of this step are given in Section 7.4. Step 2. After calibrating the MSDM, integrate these models into an environment suitable for RL applications. We present the details of our environment integrating MSDM in Section 7.5. In addition, we define environments for training RL agents. This is to test di!erent MSDM hypotheses; details on this can be found in Section 7.6. Step 3. Train DRL agents with PPO. Use optuna for hyper-parameter tuning and Stable Baselines3 for RL implementation. Details are in Section 7.7.1. Step 4. Train and select the RL agents with the optimal hyper-parameters on the training environments. In essence, these agents learn the dynamics described by the MSDM encoded in the environment. Step 5. Compare the maintenance policies advised by the RL agents using the test environment against the heuristics: Condition-Based Maintenance (CBM), Scheduled Maintenance (SchM), and Reactive Maintenance (RM). Find the definition of these heuristics in Section 7.6.2. Step 6. Analyse and compare the behaviour of the maintenance strategies for the di!erent RL models and heuristics. Reflect on the policies advantages and disadvantages. Find in Section 7.7.2 the overview of this comparison, and in Section 7.7.3 are the details along episodes. 7.4 Multi-state deterioration models 7.4.1 Case study The case study is detailed in Section II.4.3 on page 100. In this chapter, we focus on the the damage code BAF, which signifies surface damage and was observed in 35.3% of the inspections. 7.4.2 Parametrisation We consider three hazard rate distributions: Exponential, Gompertz, and Weibull. The hazard rates ϖ(t|·) are as follows: The Exponential distribution (Eq. (7.1a)) has a constant hazard rate, implying a homogeneous time with memoryless properties. In contrast, the Gompertz (Eq. (7.1b)) and Weibull (Eq. (7.1c)) distributions exhibit varying hazard rates, indicating inhomogeneous time. Exponential hazard function: ϖE(t|,)=,, (7.1a) Gompertz hazard function: ϖG(t|↼, ↽)=↼↽eϱt (7.1b) Weibull hazard function: ϖW(t|2, ε)= ε 2 t 2 ↼↗1 (7.1c)

RkJQdWJsaXNoZXIy MjY0ODMw