A learning algorithm for Markov decision processes with adaptive state aggregation
Conference : Proceedings of the IEEE Conference on Decision and Control pp. 3351-3356
Date: December 01 - December 01, 2000
We propose a simulation-based algorithm for learning good policies for a Markov decision process with unknown transition law, with aggregated states. The state aggregation itself can be adapted on a slower time scale by an auxiliary learning algorithm. Rigorous justifications are provided for both algorithms.