668430-Roa

7 7.7. Results 165 (⇀) from 0.8 to 0.9999, entropy coe"cient from 0.0001 to 0.01, steps per update (n_steps) from 250 to 3000, batch sizes from 16 to 256, activation functions (▷) (‘tanh’, ‘relu’, ‘sigmoid’), policy network architectures ([16, 16], [32, 32], [64, 64], [32, 32, 32]), and training epochs (n_epochs) from 5 to 100. We set up optuna to conduct 500 trials, aiming to maximise cumulative reward in 100 episodes. Table 7.3 details the optimal hyper-parameters identified. These parameters are used to obtain the results discussed in Sections 7.7.2 and 7.7.3, where our agents are trained over a total of 5 million time steps. Table 7.3: Optimal hyper-parameters found using optuna (Akiba, Sano, Yanase, et al., 2019). Hyper-parameter Value Learning rate 0.0003 Discount factor (⇀) 0.995 Entropy coe"cient 0.008 Steps per update (n_steps) 2,080 Batch size 104 Activation function (▷) Sigmoid Policy network architecture [32, 32, 32] Training epochs (n_epochs) 50 7.7.2 Policy analysis: overview This section o!ers a broad evaluation of the policies, with a detailed analysis over episodes presented in Section 7.7.3. We compare the agents’ performances with the heuristics detailed in Section 7.6.2 across 100 simulations in the test environment (Eq. 7.10), considering pipe ages of 0, 25, and 50 years, aiming to evaluate policy e"cacy concerning deterioration over varying pipe ages. Table 7.4: Policy cost comparison: Mean and standard deviation (Std.) of costs for AgentE, Agent-G, CBM, SchM, and RM, evaluated over 100 episodes in the test environment. Costs, in thousands of Euros (€), for pipe ages of 0, 25, and 50 years. Pipe age: 0 Pipe age: 25 Pipe age: 50 Policy Mean Std. Mean Std. Mean Std. Agent-E 51.3 80.8 116.5 97.7 156.8 121.2 Agent-G 39.7 66.2 78.7 96.6 127.1 128.3 CBM 51.3 107.2 112.3 88.5 110.7 86.6 SchM 42.5 70.9 78.9 96.4 159.8 95.9 RM 48.6 76.6 135.8 86.5 165.7 80.8

RkJQdWJsaXNoZXIy MjY0ODMw