艾拉:具身社交代理与终身记忆系统
Ella: Embodied Social Agents with Lifelong Memory
June 30, 2025
作者: Hongxin Zhang, Zheyuan Zhang, Zeyuan Wang, Zunzhe Zhang, Lixing Fang, Qinhong Zhou, Chuang Gan
cs.AI
摘要
我們介紹Ella,這是一個具身化的社交智能體,能夠在3D開放世界中的社群內進行終身學習,智能體通過日常的視覺觀察和社交互動積累經驗並獲取知識。Ella能力的核心在於一個結構化的長期多模態記憶系統,該系統能有效存儲、更新和檢索信息。它由以名稱為中心的語義記憶(用於組織獲取的知識)和時空情節記憶(用於捕捉多模態體驗)組成。通過將這一終身記憶系統與基礎模型相結合,Ella能夠檢索相關信息以進行決策、規劃日常活動、建立社交關係,並在與開放世界中其他智能體共存的同時自主進化。我們在一個動態的3D開放世界中進行了能力導向的評估,其中15個智能體參與了為期數天的社交活動,並通過一系列未見的受控評估進行測試。實驗結果表明,Ella能夠很好地影響、領導並與其他智能體合作以達成目標,展示了其通過觀察和社交互動有效學習的能力。我們的研究成果凸顯了將結構化記憶系統與基礎模型相結合在推進具身智能方面的變革性潛力。更多視頻可訪問https://umass-embodied-agi.github.io/Ella/。
English
We introduce Ella, an embodied social agent capable of lifelong learning
within a community in a 3D open world, where agents accumulate experiences and
acquire knowledge through everyday visual observations and social interactions.
At the core of Ella's capabilities is a structured, long-term multimodal memory
system that stores, updates, and retrieves information effectively. It consists
of a name-centric semantic memory for organizing acquired knowledge and a
spatiotemporal episodic memory for capturing multimodal experiences. By
integrating this lifelong memory system with foundation models, Ella retrieves
relevant information for decision-making, plans daily activities, builds social
relationships, and evolves autonomously while coexisting with other intelligent
beings in the open world. We conduct capability-oriented evaluations in a
dynamic 3D open world where 15 agents engage in social activities for days and
are assessed with a suite of unseen controlled evaluations. Experimental
results show that Ella can influence, lead, and cooperate with other agents
well to achieve goals, showcasing its ability to learn effectively through
observation and social interaction. Our findings highlight the transformative
potential of combining structured memory systems with foundation models for
advancing embodied intelligence. More videos can be found at
https://umass-embodied-agi.github.io/Ella/.