Vector Quantization for Adaptive State Aggregation in Reinforcement Learning
Mavridis, Christos N.
Conference : 2021 American Control Conference pp. 2187-2192
Date: May 26 - May 28, 2021
We propose an adaptive state aggregation scheme to be used along with temporal-difference reinforcement learning and value function approximation algorithms. The resulting algorithm constitutes a two-timescale stochastic approximation algorithm with: (a) a fast component that executes a temporal-difference reinforcement learning algorithm, and (b) a slow component, based on online vector quantization, that adaptively partitions the state space of a Markov Decision Process according to an appropriately defined dissimilarity measure. We study the convergence of the proposed methodology using Bregman Divergences as dissimilarity measures that can increase the efficiency and reduce the computational complexity of vector quantization algorithms. Finally, we quantify its performance on the Cart-pole (inverted pendulum) optimal control problem using Q-learning with adaptive state aggregation based on the Self-Organizing Map (SOM) algorithm.Download Full Paper