확산 증강 에이전트: 효율적 탐색 및 전이 학습을 위한 프레임워크

초록

우리는 대규모 언어 모델, 시각 언어 모델, 그리고 확산 모델을 활용하여 구체화된 에이전트의 강화 학습에서 샘플 효율성과 전이 학습을 개선하는 새로운 프레임워크인 Diffusion Augmented Agents (DAAG)를 소개합니다. DAAG는 확산 모델을 사용하여 에이전트의 과거 경험을 다시 라벨링하며, 이를 위해 비디오를 시간적 및 기하학적으로 일관된 방식으로 변환하여 목표 지시사항과 맞추는 Hindsight Experience Augmentation이라는 기술을 사용합니다. 대규모 언어 모델은 이 자율적인 과정을 조율하며, 인간의 감독 없이도 작동할 수 있어 평생 학습 시나리오에 적합합니다. 이 프레임워크는 1) 보상 감지기 역할을 하는 시각 언어 모델을 미세 조정하고, 2) 새로운 작업에 대해 RL 에이전트를 훈련시키는 데 필요한 보상 라벨링 데이터의 양을 줄입니다. 우리는 DAAG의 샘플 효율성 향상을 조작 및 탐색을 포함한 시뮬레이션된 로봇 환경에서 입증합니다. 결과는 DAAG가 보상 감지기 학습, 과거 경험 전이, 그리고 새로운 작업 습득을 개선함으로써 효율적인 평생 학습 에이전트 개발에 필요한 핵심 능력을 향상시킨다는 것을 보여줍니다. 보충 자료와 시각화는 우리의 웹사이트 https://sites.google.com/view/diffusion-augmented-agents/에서 확인할 수 있습니다.

English

We introduce Diffusion Augmented Agents (DAAG), a novel framework that leverages large language models, vision language models, and diffusion models to improve sample efficiency and transfer learning in reinforcement learning for embodied agents. DAAG hindsight relabels the agent's past experience by using diffusion models to transform videos in a temporally and geometrically consistent way to align with target instructions with a technique we call Hindsight Experience Augmentation. A large language model orchestrates this autonomous process without requiring human supervision, making it well-suited for lifelong learning scenarios. The framework reduces the amount of reward-labeled data needed to 1) finetune a vision language model that acts as a reward detector, and 2) train RL agents on new tasks. We demonstrate the sample efficiency gains of DAAG in simulated robotics environments involving manipulation and navigation. Our results show that DAAG improves learning of reward detectors, transferring past experience, and acquiring new tasks - key abilities for developing efficient lifelong learning agents. Supplementary material and visualizations are available on our website https://sites.google.com/view/diffusion-augmented-agents/

확산 증강 에이전트: 효율적 탐색 및 전이 학습을 위한 프레임워크

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

초록

Support