从视频中学习识别强化学习的关键状态

摘要

最近关于深度强化学习（DRL）的研究指出，可以从缺乏有关执行动作明确信息的离线数据中提取有关良好策略的算法信息。例如，人类或机器人的视频可能传达了许多有关奖励动作序列的隐含信息，但想要从观看这些视频中获益的DRL机器必须首先自行学习识别和识别相关的状态/动作/奖励。在不依赖地面真实标注的情况下，我们提出了一种名为深度状态识别器的新方法，该方法学习从编码为视频的剧集中预测回报。然后使用一种基于掩码的敏感性分析来提取/识别重要的关键状态。大量实验证明了我们的方法在理解和改进代理行为方面的潜力。源代码和生成的数据集可在https://github.com/AI-Initiative-KAUST/VideoRLCS 上获得。

English

Recent work on deep reinforcement learning (DRL) has pointed out that algorithmic information about good policies can be extracted from offline data which lack explicit information about executed actions. For example, videos of humans or robots may convey a lot of implicit information about rewarding action sequences, but a DRL machine that wants to profit from watching such videos must first learn by itself to identify and recognize relevant states/actions/rewards. Without relying on ground-truth annotations, our new method called Deep State Identifier learns to predict returns from episodes encoded as videos. Then it uses a kind of mask-based sensitivity analysis to extract/identify important critical states. Extensive experiments showcase our method's potential for understanding and improving agent behavior. The source code and the generated datasets are available at https://github.com/AI-Initiative-KAUST/VideoRLCS.

从视频中学习识别强化学习的关键状态

Learning to Identify Critical States for Reinforcement Learning from Videos

摘要

Support