668430-Roa

146 Part III: Maintenance optimisation of multi-state components The key components of RL are the Environment, Agent, and Objective, detailed below: - Environment (E): This comprises the state space S, action space A, state transition probabilities P(st+1 | st, at), and reward function R(st, at, st+1). Here, st, st+1 ↓ S are the current and next states, respectively, and at ↓ A is the action taken in st to reach st+1. The environment defines how actions a"ect the next state and rewards. - Agent (A): The agent is the decision-maker in RL that interacts with E by following a policy. It is characterised by the following: - Policy (π): A function π : S↘A↑[0,1] that maps states to a probability distribution over actions. The policy governs the agent’s behaviour, determining the action to be taken in each state at time t. - Value Function (V(s)): A function that estimates the expected return (cumulative discounted rewards) from each state s ↓ S under policy π, defined as: V(s)=Eς  ↘ k=0 ⇀kRt+k+1 | st =s, for all s ↓S, (6.7) where Eς[·] denotes the expected value of a random variable when the Afollows policy π, and t represents any time step. The recursive form of V(s) calculates the return over trajectories ϱ and is expressed as follows: V(s)=Eς [Rt+1 +⇀V(st+1) | st =s] (6.8) - State-Action Value (Q(s, a)): The expected return of performing action a ↓A for state s ↓S can be defined as the pair state-action value function: Q(s, a)=Eς [rt+1 +⇀Q(st+1, at+1) | st =s, at =a] (6.9) - Agent’s Goal: The agent Ain RL aims to optimise its policy π by maximising the optimal Q function: π↓(s)=argmax a→A Q↓(s, a) (6.10) where π↓(·) denotes the optimal policy and Q↓(·) is the optimal state-action pair. This is achieved through iterative policy evaluation and improvement. Definition 14 (Deep Neural Network). ADeep Neural Network (DNN) is defined as a function f : Rn ↑Rm with n, m↓N, and can be formally represented as a tuple N=↔Lˆ, {dˆl} Lˆ ˆl=0 , ▷(·), {Wˆl, bˆl} Lˆ ˆl=1 , L(·)↗, where: - Lˆ ↓ Ndenotes the number of layers in the network, when Lˆ ⇒2 indicates a multi-layered structure. - {dˆl} Lˆ ˆl=0 specifies the dimensions of each layer ˆl, with d0 =n for the input layer and dLˆ =mfor the output layer.

RkJQdWJsaXNoZXIy MjY0ODMw