ChatPaper.aiChatPaper

深度强化学习的理解与诊断

Understanding and Diagnosing Deep Reinforcement Learning

June 23, 2024
作者: Ezgi Korkmaz
cs.AI

摘要

最近,深度神经策略已被广泛应用于各种领域,从生物技术到自动化金融系统。然而,利用深度神经网络来逼近值函数引发了对决策边界稳定性的担忧,特别是关于政策决策对不可识别、非鲁棒特征的敏感性,这是由于高度非凸和复杂的深度神经流形所致。这些担忧构成了理解深度神经策略推理和其基本限制的障碍。因此,开发旨在理解神经网络策略学习表示中敏感性的技术至关重要。为了实现这一目标,我们引入了一个在时间和空间上提供深度神经策略决策边界不稳定方向系统分析的理论基础方法。通过在Arcade Learning Environment (ALE)中的实验,我们展示了我们的技术在识别相关不稳定方向以及测量样本偏移如何重塑神经策略景观中敏感方向集合方面的有效性。最重要的是,我们展示了最先进的鲁棒训练技术在学习不相交的不稳定方向时,与标准训练相比,随着时间的推移振荡显著增大。我们相信我们的结果揭示了强化学习策略决策过程的基本属性,并有助于构建可靠和鲁棒的深度神经策略。
English
Deep neural policies have recently been installed in a diverse range of settings, from biotechnology to automated financial systems. However, the utilization of deep neural networks to approximate the value function leads to concerns on the decision boundary stability, in particular, with regard to the sensitivity of policy decision making to indiscernible, non-robust features due to highly non-convex and complex deep neural manifolds. These concerns constitute an obstruction to understanding the reasoning made by deep neural policies, and their foundational limitations. Hence, it is crucial to develop techniques that aim to understand the sensitivities in the learnt representations of neural network policies. To achieve this we introduce a theoretically founded method that provides a systematic analysis of the unstable directions in the deep neural policy decision boundary across both time and space. Through experiments in the Arcade Learning Environment (ALE), we demonstrate the effectiveness of our technique for identifying correlated directions of instability, and for measuring how sample shifts remold the set of sensitive directions in the neural policy landscape. Most importantly, we demonstrate that state-of-the-art robust training techniques yield learning of disjoint unstable directions, with dramatically larger oscillations over time, when compared to standard training. We believe our results reveal the fundamental properties of the decision process made by reinforcement learning policies, and can help in constructing reliable and robust deep neural policies.

Summary

AI-Generated Summary

PDF91November 29, 2024