비디오로부터 강화 학습을 위한 핵심 상태 식별 학습

초록

최근의 심층 강화 학습(Deep Reinforcement Learning, DRL) 연구는 실행된 행동에 대한 명시적 정보가 없는 오프라인 데이터에서도 좋은 정책에 대한 알고리즘 정보를 추출할 수 있음을 지적했습니다. 예를 들어, 인간이나 로봇의 비디오는 보상이 높은 행동 시퀀스에 대한 많은 암묵적 정보를 전달할 수 있지만, 이러한 비디오를 관찰하여 이익을 얻고자 하는 DRL 기계는 먼저 관련된 상태/행동/보상을 식별하고 인식하는 방법을 스스로 학습해야 합니다. 우리의 새로운 방법인 Deep State Identifier는 실측 데이터 주석에 의존하지 않고, 비디오로 인코딩된 에피소드로부터 수익을 예측하는 방법을 학습합니다. 그런 다음, 마스크 기반 민감도 분석을 사용하여 중요한 핵심 상태를 추출/식별합니다. 광범위한 실험을 통해 우리의 방법이 에이전트 행동을 이해하고 개선하는 데 있어 잠재력을 보여줍니다. 소스 코드와 생성된 데이터셋은 https://github.com/AI-Initiative-KAUST/VideoRLCS에서 확인할 수 있습니다.

English

Recent work on deep reinforcement learning (DRL) has pointed out that algorithmic information about good policies can be extracted from offline data which lack explicit information about executed actions. For example, videos of humans or robots may convey a lot of implicit information about rewarding action sequences, but a DRL machine that wants to profit from watching such videos must first learn by itself to identify and recognize relevant states/actions/rewards. Without relying on ground-truth annotations, our new method called Deep State Identifier learns to predict returns from episodes encoded as videos. Then it uses a kind of mask-based sensitivity analysis to extract/identify important critical states. Extensive experiments showcase our method's potential for understanding and improving agent behavior. The source code and the generated datasets are available at https://github.com/AI-Initiative-KAUST/VideoRLCS.

비디오로부터 강화 학습을 위한 핵심 상태 식별 학습

Learning to Identify Critical States for Reinforcement Learning from Videos

초록

Support