A learning algorithm for Markov decision processes with adaptive state aggregation
Baras, John, S.
Date: December 01 - December 01, 2000
We propose a simulation-based algorithm for learning good policies for a Markov decision process with unknown transition law, with aggregated states. The state aggregation itself can be adapted on a slower time scale by an auxiliary learning algorithm. Rigorous justifications are provided for both algorithms.