통합된 에이전트를 향한 파운데이션 모델의 여정

초록

언어 모델과 시각 언어 모델은 최근 텍스트 형태로 인간의 의도 이해, 추론, 장면 이해, 계획과 유사한 행동 등 전례 없는 능력을 보여주고 있습니다. 본 연구에서는 이러한 능력을 강화 학습(Reinforcement Learning, RL) 에이전트에 내재화하고 활용하는 방법을 탐구합니다. 우리는 언어를 핵심 추론 도구로 사용하는 프레임워크를 설계하여, 이를 통해 에이전트가 효율적인 탐색, 경험 데이터 재사용, 스킬 스케줄링, 관찰로부터의 학습 등 전통적으로 별도의 수직 설계 알고리즘이 필요한 일련의 근본적인 RL 과제에 어떻게 대처할 수 있는지 탐구합니다. 우리는 이 방법을 희소 보상 시뮬레이션 로봇 조작 환경에서 테스트하며, 로봇이 일련의 물체를 쌓아야 하는 과제를 수행합니다. 우리는 탐색 효율성과 오프라인 데이터셋으로부터 데이터를 재사용하는 능력에서 기준선 대비 상당한 성능 향상을 입증하고, 학습된 스킬을 재사용하여 새로운 과제를 해결하거나 인간 전문가의 비디오를 모방하는 방법을 보여줍니다.

English

Language Models and Vision Language Models have recently demonstrated unprecedented capabilities in terms of understanding human intentions, reasoning, scene understanding, and planning-like behaviour, in text form, among many others. In this work, we investigate how to embed and leverage such abilities in Reinforcement Learning (RL) agents. We design a framework that uses language as the core reasoning tool, exploring how this enables an agent to tackle a series of fundamental RL challenges, such as efficient exploration, reusing experience data, scheduling skills, and learning from observations, which traditionally require separate, vertically designed algorithms. We test our method on a sparse-reward simulated robotic manipulation environment, where a robot needs to stack a set of objects. We demonstrate substantial performance improvements over baselines in exploration efficiency and ability to reuse data from offline datasets, and illustrate how to reuse learned skills to solve novel tasks or imitate videos of human experts.

통합된 에이전트를 향한 파운데이션 모델의 여정

Towards A Unified Agent with Foundation Models

초록

Support