Maximum-Entropy Progressive State Aggregation for Reinforcement Learning
Authors : Mavridis, Christos, Suriyarachchi, Chethana Nilesh, and Baras, John S.
Conference : 2021 60th IEEE Conference on Decision and Control (CDC 2021) pp. 5144-5149 , Austin, TX
Date: December 13 - December 15, 2021
We propose a reinforcement learning algorithm based on an adaptive state aggregation scheme defined by a progressively growing set of codevectors placed in the joint state-action space according to a maximum-entropy vector quantization scheme. The proposed algorithm constitutes a two-timescale stochastic approximation algorithm with: (a) a fast component that executes a temporal-difference learning algorithm, and (b) a slow component, based on an online deterministic annealing algorithm, that adaptively partitions the state-action space according to a dissimilarity measure that belongs to the family of Bregman divergences. The proposed online deterministic annealing algorithm is a competitivelearning neural network that shows robustness with respect to the initial conditions, requires minimal hyper-parameter tuning, and provides online control over the performancecomplexity trade-off. We study the convergence properties of the proposed methodology and quantify its performance in simulated experiments. Finally, we show that the generated codevectors can be used as training samples for sparse and progressively more accurate Gaussian process regression.Download Full Paper