朝向具有基礎模型的統一代理人

摘要

語言模型和視覺語言模型最近展示了在理解人類意圖、推理、場景理解和類似規劃行為等方面的前所未有的能力，這些能力是以文本形式呈現的。在這項工作中，我們探討如何嵌入和利用這些能力在強化學習（RL）代理程序中。我們設計了一個以語言作為核心推理工具的框架，探索這如何使代理程序應對一系列基本的RL挑戰，例如有效的探索、重複使用經驗數據、排程技能和從觀察中學習，這些傳統上需要獨立、垂直設計的算法。我們在一個稀疏獎勵的模擬機器人操作環境中測試我們的方法，其中一個機器人需要堆疊一組物體。我們展示了在探索效率和能夠重複使用來自離線數據集的數據方面相對於基準的顯著性能改進，並說明如何重複使用學習到的技能來解決新任務或模仿人類專家的視頻。

English

Language Models and Vision Language Models have recently demonstrated unprecedented capabilities in terms of understanding human intentions, reasoning, scene understanding, and planning-like behaviour, in text form, among many others. In this work, we investigate how to embed and leverage such abilities in Reinforcement Learning (RL) agents. We design a framework that uses language as the core reasoning tool, exploring how this enables an agent to tackle a series of fundamental RL challenges, such as efficient exploration, reusing experience data, scheduling skills, and learning from observations, which traditionally require separate, vertically designed algorithms. We test our method on a sparse-reward simulated robotic manipulation environment, where a robot needs to stack a set of objects. We demonstrate substantial performance improvements over baselines in exploration efficiency and ability to reuse data from offline datasets, and illustrate how to reuse learned skills to solve novel tasks or imitate videos of human experts.

朝向具有基礎模型的統一代理人

Towards A Unified Agent with Foundation Models

摘要

Support