668430-Roa

7 7.6. Experimental setup 163 Here, L and Ddenote the pipe’s length in meters and diameter in millimetres, respectively. CR is in Euros (€). The cost of failure, denoted by CF, entails assigning a substantial penalty when the agent allows a segment of the pipe to achieve a failure state (k =F). This penalty cost is established at €→100,000. Our reward function is then: rt = CM+CR+CF 100,000+900↘40 = CM+CR+CF 136,000 , (7.8) where rt represents the reward obtained at time t, the normalisation constant 136,000 corresponds to the most expensive penalty possible at time t. Thus, rt is defined within the interval [→1,0]. This reward function aims for the agent to balance maintenance actions with the prevention of undesirable pipe conditions. 7.6 Experimental setup 7.6.1 Setup We will evaluate our framework with a single pipe of constant length (40 meters) and diameter (200 mm) from the cohort CMW, which carries mixed and waste content. Given the constant dimensions, the replacement cost CR, as defined in Eq. 7.7, is €24,560. The pipe age, when initialising the episode, is randomly sampled from the uniform distribution U̸ [0,50], allowing the agent to learn the behaviour of pipes within this age range. Additionally, we evaluate the policy in steps of half a year and ∆L=1 meter. In the methodology section, we describe the training of two agents: Agent-Eand Agent-G. Agent-Eis trained in an environment where sewer main deterioration follows the MSDM parametrised with an Exponential probability density function, while Agent-Gis trained in an environment where deterioration follows the MSDM parametrised with a Gompertz probability density function. Both agents are tested in an environment where sewer main deterioration follows the MSDM parametrised with the Weibull probability density function. During training, each agent follows a specific state space, defined as follows: S Agent-E Training =↔Pipe Age, hE k , p E k (t)↗ (7.9a) S Agent-G Training =↔Pipe Age, hG k , p G k (t)↗ (7.9b) Here, S represents the state space for each agent during training. The subscripts E and G denote the Exponential and Gompertz probability density functions, respectively. Each agent’s objective is to learn an optimal maintenance strategy based on their environment’s dynamics.

RkJQdWJsaXNoZXIy MjY0ODMw