668430-Roa

14 Chapter 1. Introduction Pourgholamali, Labi, and K. C. Sinha, 2023 identifies the objective function, constraints, and decision variables as key components of MO and distinguishes between single- and multi-objective MO, as well as exact solutions and heuristics/metaheuristics. de Jonge and Scarf, 2020 explores MO for single- and multi-unit systems with continuous and discrete condition states, considering economic and stochastic dependencies. Ogunfowora and Najjaran, 2023 further discusses these aspects with a focus on the Reinforcement Learning paradigm. J. Xia and Zou, 2023 proposes a framework for maintenance using digital twins. Dui, X. Wu, S. Wu, et al., 2024 explores how to perform MO based on importance measures (e.g., based on Minimal Cut Set in a Fault Tree), including ML- and Deep Learning-based approaches, highlighting prolonged training and computational times as major drawbacks. Arts, Boute, Loeys, et al., 2024 identifies two main paradigms for MO: renewal reward theory and Markov Decision Processes (MDPs). The first identifies repeating cycles in a stochastic system for which a decision rule has been established, while the second accounts for possible system states and decision options. In this dissertation, we use MDPs, detailed in the next section. Markov Decision Processes A Markov Decision Process (MDP) is a stochastic process in which changes of state occur according to a Markov chain (Ding and Kamaruddin, 2015), and it can be used to prove that a certain type of decision rule is optimal (Arts, Boute, Loeys, et al., 2024). In an MDP, the available actions, rewards, and transition probabilities depend solely on the current state and action, not on past states or actions, making MDPs broad enough to model most realistic sequential decisionmaking problems (Puterman, 2014). Section III.4.1 provides formal definitions of MDPs, and here we o!er an example to illustrate this concept. Figure 1.10(a) depicts a 3 by 3 grid-world problem where a mouse navigates through the grids, potentially encountering traps or cheese. To model this problem as an MDP, we must define the four main components in anMDP: states, actions, transition probabilities, and rewards. States represent a specific condition or status of the system being modelled; for example, Figure 1.10(b) shows the 9 possible positions in which the mouse can be at a given time. This can be represented by a vector including positions A1, A2, . . . , C3. Actions are choices made from a set of allowable options in a given state; for instance, Figure 1.10(c) shows the possible actions the mouse can take, such as moving up, down, right, or left. Transition probabilities describe the likelihood of moving from one state to another given a particular action. For example, Figure 1.10(d) illustrates that the mouse has an 80% probability of moving from grid C3 to grid B3 by going

RkJQdWJsaXNoZXIy MjY0ODMw