エンボディドエージェントとパーソナライゼーションの融合：パーソナライズドアシスタンスにおけるメモリ活用の探求

要旨

大規模言語モデル（LLM）を搭載したエンボディエージェントは、家庭内の物体再配置タスクにおいて高い性能を発揮しています。しかし、これらのタスクは主に単一ターンのインタラクションと簡略化された指示に焦点を当てており、ユーザーに意味のある支援を提供する際の課題を真に反映していません。パーソナライズされた支援を提供するためには、エンボディエージェントは、ユーザーが物理世界に割り当てる独自の意味（例：お気に入りのカップ、朝食のルーティン）を理解し、過去のインタラクション履歴を活用して動的な現実世界の指示を解釈する必要があります。しかし、パーソナライズされた支援における記憶の活用に関するエンボディエージェントの有効性は、まだ十分に検討されていません。このギャップを埋めるため、我々はMEMENTOを提案します。これは、パーソナライズされた支援を提供するための記憶活用能力を包括的に評価するためのエンボディエージェント評価フレームワークです。我々のフレームワークは、記憶活用がタスク性能に与える影響を定量化するための2段階の記憶評価プロセス設計で構成されています。このプロセスにより、物体再配置タスクにおけるパーソナライズされた知識の理解を、目標解釈における役割に焦点を当てて評価することが可能です：（1）個人的な意味に基づいて対象物体を特定する能力（物体の意味論）、および（2）ルーティンなどの一貫したユーザーパターンから物体と位置の配置を推論する能力（ユーザーパターン）。様々なLLMを用いた実験の結果、記憶活用には重大な制限があることが明らかになりました。特に、GPT-4oのような最先端モデルでも、複数の記憶を参照する必要がある場合、特にユーザーパターンに関連するタスクにおいて、30.5%の性能低下が見られました。これらの発見と詳細な分析およびケーススタディは、より効果的なパーソナライズドエンボディエージェントの開発に向けた将来の研究に貴重な洞察を提供します。プロジェクトウェブサイト: https://connoriginal.github.io/MEMENTO

English

Embodied agents empowered by large language models (LLMs) have shown strong performance in household object rearrangement tasks. However, these tasks primarily focus on single-turn interactions with simplified instructions, which do not truly reflect the challenges of providing meaningful assistance to users. To provide personalized assistance, embodied agents must understand the unique semantics that users assign to the physical world (e.g., favorite cup, breakfast routine) by leveraging prior interaction history to interpret dynamic, real-world instructions. Yet, the effectiveness of embodied agents in utilizing memory for personalized assistance remains largely underexplored. To address this gap, we present MEMENTO, a personalized embodied agent evaluation framework designed to comprehensively assess memory utilization capabilities to provide personalized assistance. Our framework consists of a two-stage memory evaluation process design that enables quantifying the impact of memory utilization on task performance. This process enables the evaluation of agents' understanding of personalized knowledge in object rearrangement tasks by focusing on its role in goal interpretation: (1) the ability to identify target objects based on personal meaning (object semantics), and (2) the ability to infer object-location configurations from consistent user patterns, such as routines (user patterns). Our experiments across various LLMs reveal significant limitations in memory utilization, with even frontier models like GPT-4o experiencing a 30.5% performance drop when required to reference multiple memories, particularly in tasks involving user patterns. These findings, along with our detailed analyses and case studies, provide valuable insights for future research in developing more effective personalized embodied agents. Project website: https://connoriginal.github.io/MEMENTO

エンボディドエージェントとパーソナライゼーションの融合：パーソナライズドアシスタンスにおけるメモリ活用の探求

Embodied Agents Meet Personalization: Exploring Memory Utilization for Personalized Assistance

要旨

Support