668430-Roa

1 1.2. Main concepts 15 A1, [ [ A2,A3, B1,B2, B3, C1,C2, C3 States A 1 2 3 B C Grid-world problem Actions Up Left Up +1 C3 B3 Left -10 C3 C2 Down Right (b) (a) (d) (e) (c) Rewards Transition probabilities P(B3|C3,Up)=0.8 Main elements in a Markov Decision Process Figure 1.10: Modelling a grid-world problem using an MDP (example). Up. Rewards are scalar feedback signals assigned to state transitions, quantifying the immediate outcome of an action in a given state. For instance, in Figure 1.10(e), if the mouse moves from C3 to B3, it receives a +1 reward, indicating a favourable action, while moving from C3 to C2 results in a →10 reward, indicating an undesired action leading to a trap. As mentioned, the MDP provides a framework for formulating the optimisation problem. The next step is to ‘solve’ the MDP to derive the desired optimal policy, which, for the example in Figure 1.10(a), is: [Up, Left, Left, Up]. If the mouse follows these actions, it will reach the cheese. But how can such a problem be mathematically modelled? Several algorithms can be applied, including value and policy iteration, linear programming (Puterman, 2014), dynamic programming (Bertsekas, 2012), and Reinforcement Learning (Sutton and Barto, 2018). In this dissertation, we employ Deep Reinforcement Learning, which is discussed in the next section. Deep Reinforcement Learning Before delving into Deep Reinforcement Learning (DRL), it is important to first consider Reinforcement Learning (RL). RL is a type of ML paradigm, distinct from supervised and unsupervised learning, in that it involves anagent learning behaviour through trial and error by interacting with a virtual environment (Kaelbling, Littman, and A. W. Moore, 1996). Sutton and Barto, 2018 defines RL as “learning what to do—how to map situations to actions—so as to maximise a numerical reward signal”. For further details on RL, we recommend the latter reference. The concept of RL solidified in the 1980s and has gained significant momentum since 2017 with the advent of Deep Reinforcement Learning (DRL). DRL emerged with the aim of revolutionising the field of AI by equipping RL with the capabilities of Deep Neural Networks (DNNs), making RL more scalable and capable of handling complex problems. This has led to popular applications in robotics and

RkJQdWJsaXNoZXIy MjY0ODMw