《我的世界》游戏中多模态大语言模型智能体的经验迁移研究

摘要

在复杂游戏环境中运行的多模态大语言模型智能体需持续复用过往经验以高效解决新任务。本研究提出Echo——一种面向迁移的记忆框架，使智能体能够从先验交互中提炼可操作知识，而非将记忆视为静态记录的被动存储库。为实现显式迁移，Echo将可复用知识解构为五个维度：结构、属性、过程、功能与交互。该框架使智能体能够识别跨任务共享的重复模式，并推断哪些先验经验适用于新情境。基于此框架，Echo利用情境类比学习技术检索相关经验，通过上下文示例将其适配至未知任务。在《我的世界》环境中的实验表明，在从零开始学习设定下，Echo在物体解锁任务上实现1.3至1.7倍的加速效果。此外，Echo展现出爆发式链式解锁现象——在获得可迁移经验后的短时间内快速解锁多个相似物品。这些结果表明，经验迁移是提升多模态大语言模型智能体在复杂交互环境中效率与适应性的有效路径。

English

Multimodal LLM agents operating in complex game environments must continually reuse past experience to solve new tasks efficiently. In this work, we propose Echo, a transfer-oriented memory framework that enables agents to derive actionable knowledge from prior interactions rather than treating memory as a passive repository of static records. To make transfer explicit, Echo decomposes reusable knowledge into five dimensions: structure, attribute, process, function, and interaction. This formulation allows the agent to identify recurring patterns shared across different tasks and infer what prior experience remains applicable in new situations. Building on this formulation, Echo leverages In-Context Analogy Learning (ICAL) to retrieve relevant experiences and adapt them to unseen tasks through contextual examples. Experiments in Minecraft show that, under a from-scratch learning setting, Echo achieves a 1.3x to 1.7x speed-up on object-unlocking tasks. Moreover, Echo exhibits a burst-like chain-unlocking phenomenon, rapidly unlocking multiple similar items within a short time interval after acquiring transferable experience. These results suggest that experience transfer is a promising direction for improving the efficiency and adaptability of multimodal LLM agents in complex interactive environments.