668430-Roa

7 7.5. Definition of Markov Decision Process for Maintenance Policy Optimisation of a sewer main considering deterioration over the pipe length 159 following sections, we provide the details of the environment, namely the state and action spaces, as well as the transition probability and reward functions. Agent 1: Maintenance 2: Replace Action (at) Environment: Action space Reward (rt) rt+1 St+1 State (St) State space Ss tp aa tc ee Pipe length (L) ΔL Health vector MSDM Sk 0: Do nothing k=1 k=2 k=3 k=4 k=5 k=4 k=1 k=1 k=2 k=1 Figure 7.3: Environment for maintenance policy optimisation of a sewer main via Deep Reinforcement Learning, considering deterioration along the pipe length. 7.5.1 State space S Our approach focuses on developing age-based maintenance policies, incorporating the sewer main’s age into the state representation. Our state space is continuous and it is structured to include three key components: (i) the age of the pipe, (ii) the health vector, and (iii) the stochastic prediction of severity levels. We next describe the last two components. Health vector (h) In modelling the deterioration of linear structures like sewer mains, it is important to represent changes accurately along their length. For this purpose, we define a health vector (h), which quantitatively measures the deterioration at various points along the pipe. The vector is crucial in our framework, particularly influencing the reward function as described in Section 7.5.4. Construction of h: We discretise the pipe into segments of equal length ∆L, with ∆L<L, where Lis the total length of the pipe. The number of segments, ω (Eq. 7.2), is calculated using the ceiling function to ensure it remains an integer even if Lis not perfectly divisible by ∆L: ω = L ∆L (7.2)

RkJQdWJsaXNoZXIy MjY0ODMw