ChatPaper.aiChatPaper

基於伊藤-齋藤損失的風險規避強化學習

Risk-Averse Reinforcement Learning with Itakura-Saito Loss

May 22, 2025
作者: Igor Udovichenko, Olivier Croissant, Anita Toleutaeva, Evgeny Burnaev, Alexander Korotin
cs.AI

摘要

風險厭惡強化學習在多個高風險領域中找到了應用。與旨在最大化期望收益的經典強化學習不同,風險厭惡的智能體選擇最小化風險的策略,有時甚至會犧牲期望值。這些偏好可以通過效用理論來框架化。我們專注於指數效用函數的特定情況,在這種情況下,我們可以推導出貝爾曼方程,並只需稍作修改即可應用各種強化學習算法。然而,這些方法由於在整個過程中需要進行指數計算,而存在數值不穩定的問題。為了解決這一問題,我們引入了一種基於Itakura-Saito散度的數值穩定且數學嚴謹的損失函數,用於學習狀態價值和動作價值函數。我們從理論和實證兩個方面,將我們提出的損失函數與已有的替代方案進行了評估。在實驗部分,我們探討了多種金融場景,其中一些具有已知的解析解,並展示了我們的損失函數優於其他替代方案。
English
Risk-averse reinforcement learning finds application in various high-stakes fields. Unlike classical reinforcement learning, which aims to maximize expected returns, risk-averse agents choose policies that minimize risk, occasionally sacrificing expected value. These preferences can be framed through utility theory. We focus on the specific case of the exponential utility function, where we can derive the Bellman equations and employ various reinforcement learning algorithms with few modifications. However, these methods suffer from numerical instability due to the need for exponent computation throughout the process. To address this, we introduce a numerically stable and mathematically sound loss function based on the Itakura-Saito divergence for learning state-value and action-value functions. We evaluate our proposed loss function against established alternatives, both theoretically and empirically. In the experimental section, we explore multiple financial scenarios, some with known analytical solutions, and show that our loss function outperforms the alternatives.

Summary

AI-Generated Summary

PDF202May 23, 2025