從影片中學習識別強化學習的關鍵狀態

摘要

最近關於深度強化學習（DRL）的研究指出，可以從缺乏有關執行動作明確信息的離線數據中提取有關良好策略的算法信息。例如，人類或機器人的影片可能隱含許多關於有獎勵的動作序列的信息，但一個希望從觀看這些影片中獲益的DRL機器必須首先自行學習識別和辨識相關的狀態/動作/獎勵。在不依賴地面真實標註的情況下，我們提出了一種名為深度狀態識別器的新方法，該方法學習從以影片編碼的情節中預測回報。然後，它使用一種基於遮罩的敏感性分析來提取/識別重要的關鍵狀態。大量實驗展示了我們方法理解和改進代理行為的潛力。源代碼和生成的數據集可在https://github.com/AI-Initiative-KAUST/VideoRLCS 上找到。

English

Recent work on deep reinforcement learning (DRL) has pointed out that algorithmic information about good policies can be extracted from offline data which lack explicit information about executed actions. For example, videos of humans or robots may convey a lot of implicit information about rewarding action sequences, but a DRL machine that wants to profit from watching such videos must first learn by itself to identify and recognize relevant states/actions/rewards. Without relying on ground-truth annotations, our new method called Deep State Identifier learns to predict returns from episodes encoded as videos. Then it uses a kind of mask-based sensitivity analysis to extract/identify important critical states. Extensive experiments showcase our method's potential for understanding and improving agent behavior. The source code and the generated datasets are available at https://github.com/AI-Initiative-KAUST/VideoRLCS.

從影片中學習識別強化學習的關鍵狀態

Learning to Identify Critical States for Reinforcement Learning from Videos

摘要

Support