拡散拡張エージェント：効率的な探索と転移学習のためのフレームワーク

要旨

本論文では、拡散モデルを活用した新たなフレームワーク「Diffusion Augmented Agents (DAAG)」を提案する。DAAGは、大規模言語モデル、視覚言語モデル、および拡散モデルを統合し、具現化エージェントの強化学習におけるサンプル効率と転移学習を向上させる。DAAGは、過去のエージェントの経験を「Hindsight Experience Augmentation」と呼ばれる技術を用いて再ラベル付けする。この技術では、拡散モデルを使用してビデオを時間的および幾何学的に一貫した方法で変換し、目標指示に合わせる。大規模言語モデルがこの自律的なプロセスを調整し、人間の監督を必要としないため、生涯学習シナリオに適している。本フレームワークは、1) 報酬検出器として機能する視覚言語モデルのファインチューニング、および2) 新しいタスクに対するRLエージェントのトレーニングに必要な報酬ラベル付きデータの量を削減する。我々は、操作とナビゲーションを含むシミュレーションロボティクス環境において、DAAGのサンプル効率の向上を実証する。結果は、DAAGが報酬検出器の学習、過去の経験の転移、および新しいタスクの獲得を改善することを示しており、効率的な生涯学習エージェントの開発に不可欠な能力である。補足資料とビジュアライゼーションは、ウェブサイトhttps://sites.google.com/view/diffusion-augmented-agents/で公開されている。

English

We introduce Diffusion Augmented Agents (DAAG), a novel framework that leverages large language models, vision language models, and diffusion models to improve sample efficiency and transfer learning in reinforcement learning for embodied agents. DAAG hindsight relabels the agent's past experience by using diffusion models to transform videos in a temporally and geometrically consistent way to align with target instructions with a technique we call Hindsight Experience Augmentation. A large language model orchestrates this autonomous process without requiring human supervision, making it well-suited for lifelong learning scenarios. The framework reduces the amount of reward-labeled data needed to 1) finetune a vision language model that acts as a reward detector, and 2) train RL agents on new tasks. We demonstrate the sample efficiency gains of DAAG in simulated robotics environments involving manipulation and navigation. Our results show that DAAG improves learning of reward detectors, transferring past experience, and acquiring new tasks - key abilities for developing efficient lifelong learning agents. Supplementary material and visualizations are available on our website https://sites.google.com/view/diffusion-augmented-agents/

拡散拡張エージェント：効率的な探索と転移学習のためのフレームワーク

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

要旨

Support