마인크래프트 게임 내 다중 모달 LLM 에이전트를 위한 경험 전이

초록

복잡한 게임 환경에서 작동하는 멀티모달 LLM 에이전트는 새로운 작업을 효율적으로 해결하기 위해 과거 경험을 지속적으로 재사용해야 합니다. 본 연구에서는 에이전트가 메모리를 정적 기록의 수동적 저장소로 취급하는 대신, 이전 상호작용으로부터 실행 가능한 지식을 도출할 수 있게 하는 전송 지향 메모리 프레임워크인 Echo를 제안합니다. 전송을 명시적으로 만들기 위해 Echo는 재사용 가능한 지식을 구조, 속성, 과정, 기능, 상호작용이라는 다섯 차원으로 분해합니다. 이러한 공식화를 통해 에이전트는 서로 다른 작업 간에 공유되는 반복적인 패턴을 식별하고 새로운 상황에서 어떤 과거 경험이 여전히 적용 가능한지 추론할 수 있습니다. 이 공식화를 기반으로 Echo는 In-Context Analogy Learning(ICAL)을 활용하여 관련 경험을 검색하고 상황적 예시를 통해 보이지 않는 작업에 이를 적용합니다. Minecraft에서의 실험 결과, 처음부터 학습하는 환경에서 Echo가 객체 획득 작업에서 1.3배에서 1.7배의 속도 향상을 달성함을 보여줍니다. 더 나아가 Echo는 전송 가능한 경험을 획득한 후 짧은 시간 간격 내에 여러 유사한 아이템을 빠르게 해제하는 폭발적인 연쇄 해제 현상을 나타냅니다. 이러한 결과는 경험 전송이 복잡한 상호작용 환경에서 멀티모달 LLM 에이전트의 효율성과 적응성을 향상시키는 유망한 방향임을 시사합니다.

English

Multimodal LLM agents operating in complex game environments must continually reuse past experience to solve new tasks efficiently. In this work, we propose Echo, a transfer-oriented memory framework that enables agents to derive actionable knowledge from prior interactions rather than treating memory as a passive repository of static records. To make transfer explicit, Echo decomposes reusable knowledge into five dimensions: structure, attribute, process, function, and interaction. This formulation allows the agent to identify recurring patterns shared across different tasks and infer what prior experience remains applicable in new situations. Building on this formulation, Echo leverages In-Context Analogy Learning (ICAL) to retrieve relevant experiences and adapt them to unseen tasks through contextual examples. Experiments in Minecraft show that, under a from-scratch learning setting, Echo achieves a 1.3x to 1.7x speed-up on object-unlocking tasks. Moreover, Echo exhibits a burst-like chain-unlocking phenomenon, rapidly unlocking multiple similar items within a short time interval after acquiring transferable experience. These results suggest that experience transfer is a promising direction for improving the efficiency and adaptability of multimodal LLM agents in complex interactive environments.

마인크래프트 게임 내 다중 모달 LLM 에이전트를 위한 경험 전이

Experience Transfer for Multimodal LLM Agents in Minecraft Game

초록

Support