Exponential TD Learning: A Risk-Sensitive Actor-Critic Reinforcement Learning Algorithm
Authors : Erfaun Noorani, Christos Mavridis, and John S. Baras
Conference : 2023 American Control Conference, Invited Session "Risk-Aware Design and Control" (ACC2023) pp. 4104-4109 , San Diego, CA
Date: May 31 - June 02, 2023
Incorporating risk in the decision-making process has been shown to lead to significant performance improvement in optimal control and reinforcement learning algorithms. We construct a temporal-difference risk-sensitive reinforcement learning algorithm using the exponential criteria commonly used in risk-sensitive control. The proposed method resembles an actor-critic architecture with the ‘actor’ implementing a policy gradient algorithm based on the exponential of the reward-to-go, which is estimated by the ‘critic’. The novelty of the update rule of the ‘critic’ lies in the use of a modified objective function that corresponds to the underlying multiplicative Bellman’s equation. Our results suggest that the use of the exponential criteria accelerates the learning process and reduces its variance, i.e., risk-sensitiveness can be utilized by
actor-critic methods and can lead to improved performance.