基于伊藤-斋藤损失的风险规避强化学习

摘要

风险规避型强化学习在多个高风险领域中得到应用。与旨在最大化期望收益的经典强化学习不同，风险规避型智能体选择能够最小化风险的策略，有时甚至牺牲部分期望价值。这些偏好可以通过效用理论来框架化。我们特别关注指数效用函数的情形，在此框架下，我们能够推导出贝尔曼方程，并只需稍作修改即可应用多种强化学习算法。然而，这些方法因过程中需频繁进行指数计算而面临数值不稳定的问题。为解决这一难题，我们引入了一种基于Itakura-Saito散度的数值稳定且数学严谨的损失函数，用于学习状态价值函数和动作价值函数。我们通过理论与实证双重角度，将所提出的损失函数与现有替代方案进行了对比评估。在实验部分，我们探讨了多种金融场景，其中一些场景具有已知的解析解，结果表明我们的损失函数表现优于其他方案。

English

Risk-averse reinforcement learning finds application in various high-stakes fields. Unlike classical reinforcement learning, which aims to maximize expected returns, risk-averse agents choose policies that minimize risk, occasionally sacrificing expected value. These preferences can be framed through utility theory. We focus on the specific case of the exponential utility function, where we can derive the Bellman equations and employ various reinforcement learning algorithms with few modifications. However, these methods suffer from numerical instability due to the need for exponent computation throughout the process. To address this, we introduce a numerically stable and mathematically sound loss function based on the Itakura-Saito divergence for learning state-value and action-value functions. We evaluate our proposed loss function against established alternatives, both theoretically and empirically. In the experimental section, we explore multiple financial scenarios, some with known analytical solutions, and show that our loss function outperforms the alternatives.

基于伊藤-斋藤损失的风险规避强化学习

Risk-Averse Reinforcement Learning with Itakura-Saito Loss

摘要

Support