668430-Roa

III.4 Preliminaries 145 performance with total maintenance costs over infrastructure life-cycles (Marugán, 2023). - DRL’s growth for maintenance planning is driven by increased access to Internet of Things data and higher computing power, enabling seamless integration of predictive and optimisation models in maintenance (Ogunfowora and Najjaran, 2023). - DRL holds significant potential for the future of smart manufacturing, promoting a cognitive, personalised approach (C. Li, Zheng, Y. Yin, et al., 2023). III.4 Preliminaries III.4.1 Markov Decision Process A Markov Decision Process (MDP) is a well-known mathematical framework for formulating sequential decision-making problems (Puterman, 1990). Below, we provide a formal definition. Definition 12 (Markov Decision Process). Let T ⇐N0 represent the set of all non-negative integers. Let t be a discrete-time index such that t ↓T, indexing the time steps in a stochastic sequential decision-making process. AMarkov Decision Process (MDP) is formally defined by the tuple M=↔S, A, P, R, π0, ⇀↗, where: - S is a set of states. - Ais a set of actions. - P: S↘A↘S ↑[0,1] is the transition probability function, P(st+1|st, at), giving the probability of transitioning from state st to state st+1 under action at. - R: S↘A↘S ↑Ris the reward function, R(st, at, st+1), specifying the reward received after the transition. - π0 : S↘A↑[0,1] is the initial policy distribution at t =0. - ⇀ ↓ [0,1] is the discount factor, quantifying the importance of future rewards relative to immediate rewards. III.4.2 Deep Reinforcement Learning Reinforcement Learning (RL) seeks to develop agents that learn optimal behaviours in virtual environments through trial and error guided by a reward signal (Arulkumaran, Deisenroth, Brundage, et al., 2017). Combining RL with Deep Neural Networks (DNNs) results in Deep Reinforcement Learning (DRL), o!ering greater scalability and the ability to tackle complex problems. Below we provide formal definitions on RL and DNNs. Definition 13 (Reinforcement Learning). Reinforcement Learning (RL) is a learning paradigm where an agent interacts with an environment to maximise cumulative reward. Formally, it can be modelled as an MDP (see Definition 12).

RkJQdWJsaXNoZXIy MjY0ODMw