A Probabilistic Perspective on Risk-sensitive Reinforcement Learning

Title : A Probabilistic Perspective on Risk-sensitive Reinforcement Learning
Authors : Erfaun Noorani and John S. Baras
Conference : 2022 American Control Conference (ACC2022) pp. 2697-2702 , Atlanta, GA
Date: June 08 - June 10, 2022

Robustness is a key enabler of real-world applications of Reinforcement Learning (RL). The robustness properties of risk-sensitive controllers have long been established. We investigate risk-sensitive Reinforcement Learning (as a generalization of risk-sensitive stochastic control), by theoretically analyzing
the risk-sensitive exponential (exponential of the total reward) criteria, and the benefits and improvements the introduction of risk-sensitivity brings to conventional RL. We provide a probabilistic interpretation of (I) the risk-sensitive exponential, (II) the risk-neutral expected cumulative reward, and (III) the maximum entropy Reinforcement Learning objectives, and explore their connections from a probabilistic perspective. Using Probabilistic Graphical Models (PGM), we establish that in the RL setting, maximization of the risk-sensitive exponential criteria is equivalent to maximizing the probability of taking an optimal action at all time-steps during an episode. We show that the maximization of the standard risk-neutral expected cumulative return is equivalent to maximizing a lower bound, particularly the Evidence lower Bound, on the probability of taking an optimal action at all time-steps during an episode. Furthermore, we show that the maximization of the maximumentropy Reinforcement Learning objective is equivalent to maximizing a lower bound on the probability of taking an optimal action at all time-steps during an episode, where the lower bound corresponding to the maximum entropy objective is tighter and smoother than the lower bound corresponding to the expected cumulative return objective. These equivalences establish the benefits of risk-sensitive exponential objective and shed lights on previously postulated regularized objectives, such as maximum entropy. The utilization of a PGM model, coupled with exponential criteria, offers a number of advantages (e.g. facilitate theoretical analysis and derivation of bounds).

Download Full Paper