具身智能体与个性化相遇:探索记忆机制在个性化辅助中的应用
Embodied Agents Meet Personalization: Exploring Memory Utilization for Personalized Assistance
May 22, 2025
作者: Taeyoon Kwon, Dongwook Choi, Sunghwan Kim, Hyojun Kim, Seungjun Moon, Beong-woo Kwak, Kuan-Hao Huang, Jinyoung Yeo
cs.AI
摘要
由大型语言模型(LLMs)赋能的具身代理在家庭物品重排任务中展现了强劲的性能。然而,这些任务主要集中于简化指令的单轮交互,未能真实反映为用户提供有意义帮助所面临的挑战。为了提供个性化协助,具身代理必须理解用户赋予物理世界的独特语义(如心爱的杯子、早餐习惯),通过利用先前的交互历史来解读动态的现实世界指令。然而,具身代理在利用记忆进行个性化协助方面的有效性仍鲜有探索。为填补这一空白,我们提出了MEMENTO,一个旨在全面评估记忆利用能力以提供个性化协助的具身代理评估框架。我们的框架包含一个两阶段记忆评估流程设计,能够量化记忆利用对任务表现的影响。该流程通过聚焦于目标解读中个性化知识的作用,评估代理在物品重排任务中对个性化知识的理解:(1)基于个人意义识别目标物品的能力(物品语义),以及(2)从用户一致模式(如日常习惯)中推断物品位置配置的能力(用户模式)。我们在多种LLMs上的实验揭示了记忆利用的显著局限性,即便是前沿模型如GPT-4o,在需要参考多重记忆时,特别是在涉及用户模式的任务中,性能下降了30.5%。这些发现,连同我们的详细分析和案例研究,为未来开发更有效的个性化具身代理提供了宝贵的见解。项目网站:https://connoriginal.github.io/MEMENTO
English
Embodied agents empowered by large language models (LLMs) have shown strong
performance in household object rearrangement tasks. However, these tasks
primarily focus on single-turn interactions with simplified instructions, which
do not truly reflect the challenges of providing meaningful assistance to
users. To provide personalized assistance, embodied agents must understand the
unique semantics that users assign to the physical world (e.g., favorite cup,
breakfast routine) by leveraging prior interaction history to interpret
dynamic, real-world instructions. Yet, the effectiveness of embodied agents in
utilizing memory for personalized assistance remains largely underexplored. To
address this gap, we present MEMENTO, a personalized embodied agent evaluation
framework designed to comprehensively assess memory utilization capabilities to
provide personalized assistance. Our framework consists of a two-stage memory
evaluation process design that enables quantifying the impact of memory
utilization on task performance. This process enables the evaluation of agents'
understanding of personalized knowledge in object rearrangement tasks by
focusing on its role in goal interpretation: (1) the ability to identify target
objects based on personal meaning (object semantics), and (2) the ability to
infer object-location configurations from consistent user patterns, such as
routines (user patterns). Our experiments across various LLMs reveal
significant limitations in memory utilization, with even frontier models like
GPT-4o experiencing a 30.5% performance drop when required to reference
multiple memories, particularly in tasks involving user patterns. These
findings, along with our detailed analyses and case studies, provide valuable
insights for future research in developing more effective personalized embodied
agents. Project website: https://connoriginal.github.io/MEMENTOSummary
AI-Generated Summary