An Approach Using Hierarchical Reinforcement Learning and Parallel Computing for the K-Servos problem.
Metrical Task Systems, The K-Server Problem, Curse of Dimensionality, Hierarchical Reinforcement Learning, Q-Learning Algorithm, Parallel Computing.
A metrical task system is an abstract model for a class of online optimization problems, including the paging, list acessing, and the k-server problem. The use of reinforcement learning to solving these problems, although proved to be efficient, is restricted to a simple class of problems due to the curse of dimensionality inherent to the method. This work presents a solution that uses reinforcement learning, based on hierarchical decomposition techniques and parallel computing to solve optimization problems in metric spaces, in order to extend the applicability of the method to more complex problems, bypassing the restriction of its use to minor problems. As the size of the storage structure used by reinforcement learning to obtain the optimal policy grows as a function of the number of states and actions, which in turn is proportional to the number n of nodes and k of server, it is noticed that their growth is given exponentially (. To circumvent it, the problem was modeled with a multi-step decision process where we initially used the k-means algorithm as a grouping method to decompose the problem into smaller subproblems. Then the Q-learning algorithm was applied in the subgroups, aiming at achieving the best servo displacement policy. In this step, parallel computing techniques were used so that the learning and storage processes in the subgroups were executed in parallel. In this way, it was tried to reduce the dimension of the problem, as well as the total execution time of the algorithm, making possible the application of the proposed method to large instances. We will analyze aspects related to the quality of the hierarchical solution obtained when compared to the classical reinforcement learning, and its possible limitations. In addition to parallel performance analysis.