164 Chapter 7. Maintenance Strategies for Sewer Pipes with Multi-State Deterioration and Deep Reinforcement Learning For testing, both agents are evaluated in the same environment, with the state space defined as follows: S Agent-E Testing =↔Pipe Age, hW k , p E k (t)↗ (7.10a) S Agent-G Testing =↔Pipe Age, hW k , p G k (t)↗ (7.10b) In both cases, pE k (t) andp G k (t) remain consistent with the training phase, reflecting the MSDM predictions. However, the health vector hk follows the deterioration behaviour described by the Weibull probability density function, indicated by the subscript W. 7.6.2 Comparison of maintenance strategies We compare the RL agent’s performance against maintenance policies based on heuristics. For this, we define the following: • Condition-Based Maintenance (CBM): Maintenance actions are based on the sewer main’s condition. Specifically, replacement (at =2) is performed if pipe_age ⇒ 70 or hk=F ⇒ 0.0; maintenance (at = 1) is conducted if hk=4 ⇒0.1 or hk=5 ⇒0.05; otherwise, no action (at =0) is taken. • Scheduled Maintenance (SchM): Actions are time-based. Replacement (at =2) is executed if hk=F ⇒0.0; maintenance (at =1) occurs every 10 years; otherwise, no action (at =0) is taken. • Reactive Maintenance (RM): Replacement is undertaken only upon pipe failure, i.e., replacement (at =2) is performed if hk=F ⇒0.0; otherwise, no action (at =0) is taken. Note that CBM and SchM are defined based on plausible values. However, these heuristics can be further calibrated for enhanced performance, which is beyond the scope of this chapter. 7.7 Results 7.7.1 Implementation and hyper-parameter tuning Our framework uses Stable Baselines3 (Ra"n, Hill, Gleave, et al., 2021), comprising robust implementations of RL algorithms in PyTorch (Ansel, Yang, He, et al., 2024). Specifically, we utilise the PPO algorithm. Hyper-parameter optimisation is performed using optuna (Akiba, Sano, Yanase, et al., 2019), a framework dedicated to automating the optimisation of hyper-parameters. The search space encompasses: exponentially-decaying learning rate with a decay rate of 0.05, with an initial learning rate ranging from10↗5 to 10↗2, discount factor
RkJQdWJsaXNoZXIy MjY0ODMw