ChatPaper.aiChatPaper

深度強化學習的理解與診斷

Understanding and Diagnosing Deep Reinforcement Learning

June 23, 2024
作者: Ezgi Korkmaz
cs.AI

摘要

最近,在各種領域中,從生物技術到自動化金融系統,都開始廣泛應用深度神經策略。然而,利用深度神經網絡來近似值函數引發了對決策邊界穩定性的擔憂,特別是關於政策決策對不可辨識、非穩健特徵的敏感性,這是由於高度非凸和複雜的深度神經流形。這些擔憂構成了理解深度神經策略所做推理及其基本限制的障礙。因此,發展旨在理解神經網絡策略學習表示中敏感性的技術至關重要。為了實現這一目標,我們引入了一種在時間和空間上系統分析深度神經策略決策邊界中不穩定方向的理論基礎方法。通過在Arcade Learning Environment (ALE)中的實驗,我們展示了我們的技術在識別相關不穩定方向和衡量樣本變化如何重塑神經策略景觀中敏感方向集合方面的有效性。最重要的是,我們證明了最先進的穩健訓練技術在時間上產生了截然不同的不穩定方向學習,與標準訓練相比,這些方向在時間上的振盪明顯更大。我們相信我們的結果揭示了強化學習策略所做決策過程的基本特性,並有助於構建可靠且穩健的深度神經策略。
English
Deep neural policies have recently been installed in a diverse range of settings, from biotechnology to automated financial systems. However, the utilization of deep neural networks to approximate the value function leads to concerns on the decision boundary stability, in particular, with regard to the sensitivity of policy decision making to indiscernible, non-robust features due to highly non-convex and complex deep neural manifolds. These concerns constitute an obstruction to understanding the reasoning made by deep neural policies, and their foundational limitations. Hence, it is crucial to develop techniques that aim to understand the sensitivities in the learnt representations of neural network policies. To achieve this we introduce a theoretically founded method that provides a systematic analysis of the unstable directions in the deep neural policy decision boundary across both time and space. Through experiments in the Arcade Learning Environment (ALE), we demonstrate the effectiveness of our technique for identifying correlated directions of instability, and for measuring how sample shifts remold the set of sensitive directions in the neural policy landscape. Most importantly, we demonstrate that state-of-the-art robust training techniques yield learning of disjoint unstable directions, with dramatically larger oscillations over time, when compared to standard training. We believe our results reveal the fundamental properties of the decision process made by reinforcement learning policies, and can help in constructing reliable and robust deep neural policies.

Summary

AI-Generated Summary

PDF91November 29, 2024