668430-Roa

140 Part III: Maintenance optimisation of multi-state components III.2 Nomenclature Refer to the nomenclature on Markov chains for Multi-State Deterioration Model in Section II.2 on page 94. (Contextual-) Markov Decision Processes: T Time horizon t ↓T Time (e.g., component age) S State space A Action space C Context space R(·) Reward function Pij(·) Transition probability function st, st+1 ↓S Current (t) and Next (t +1) state instances at ↓A Action instance at time t rt Reward at time t ⇀ Discount factor π0 Initial policy distribution M Markov Decision Process c ↓C Context instance K(·) Mapping function Mc Contextual Markov Decision Process Reinforcement Learning: E Environment A Agent πt Policy at t V(·) Value function Q(s, a) State-action value function Deep Neural Networks: N Deep Neural Network Lˆ Number of layers n Size of the input layer m Size of the output layer ˆl Layer in the network, withˆl =1, . . . , Lˆ W Weight matrix b Bias vector ▷(·) Activation function L(·) Loss function Proximal Policy Optimisation: O Surrogate objective function ˆAt Advantage function at t ◁ Policy parameters

RkJQdWJsaXNoZXIy MjY0ODMw