구현된 에이전트와 개인화의 만남: 개인 맞춤형 지원을 위한 메모리 활용 탐구

초록

대규모 언어 모델(LLM)을 기반으로 한 구체화된 에이전트는 가정 내 물건 재배치 작업에서 강력한 성능을 보여왔다. 그러나 이러한 작업은 단순화된 지시사항과 단일 회차 상호작용에 주로 초점을 맞추고 있어, 사용자에게 의미 있는 지원을 제공하는 데 따른 진정한 도전 과제를 반영하지 못한다. 개인 맞춤형 지원을 제공하기 위해서는 구체화된 에이전트가 사용자가 물리적 세계에 부여하는 고유한 의미(예: 좋아하는 컵, 아침 루틴)를 이해해야 하며, 이를 위해 이전 상호작용 기록을 활용하여 동적인 실세계 지시사항을 해석할 수 있어야 한다. 그러나 구체화된 에이전트가 개인 맞춤형 지원을 위해 메모리를 활용하는 효과는 아직까지 충분히 탐구되지 않았다. 이러한 격차를 해결하기 위해, 우리는 개인 맞춤형 지원을 제공하기 위한 메모리 활용 능력을 종합적으로 평가할 수 있는 MEMENTO라는 개인 맞춤형 구체화된 에이전트 평가 프레임워크를 제안한다. 우리의 프레임워크는 메모리 활용이 작업 성능에 미치는 영향을 정량화할 수 있는 두 단계의 메모리 평가 프로세스 설계로 구성된다. 이 프로세스는 목표 해석에서 개인화된 지식의 역할에 초점을 맞춰 물건 재배치 작업에서 에이전트의 개인화된 지식 이해 능력을 평가할 수 있도록 한다: (1) 개인적인 의미를 기반으로 대상 물건을 식별하는 능력(물건 의미론), (2) 루틴과 같은 일관된 사용자 패턴에서 물건-위치 구성을 추론하는 능력(사용자 패턴). 다양한 LLM에 대한 우리의 실험은 메모리 활용에 있어 상당한 한계를 드러냈으며, 특히 사용자 패턴과 관련된 작업에서 GPT-4o와 같은 최첨단 모델도 다중 메모리를 참조해야 할 때 30.5%의 성능 하락을 경험했다. 이러한 발견과 함께 우리의 상세 분석 및 사례 연구는 더 효과적인 개인 맞춤형 구체화된 에이전트 개발을 위한 미래 연구에 유용한 통찰을 제공한다. 프로젝트 웹사이트: https://connoriginal.github.io/MEMENTO

English

Embodied agents empowered by large language models (LLMs) have shown strong performance in household object rearrangement tasks. However, these tasks primarily focus on single-turn interactions with simplified instructions, which do not truly reflect the challenges of providing meaningful assistance to users. To provide personalized assistance, embodied agents must understand the unique semantics that users assign to the physical world (e.g., favorite cup, breakfast routine) by leveraging prior interaction history to interpret dynamic, real-world instructions. Yet, the effectiveness of embodied agents in utilizing memory for personalized assistance remains largely underexplored. To address this gap, we present MEMENTO, a personalized embodied agent evaluation framework designed to comprehensively assess memory utilization capabilities to provide personalized assistance. Our framework consists of a two-stage memory evaluation process design that enables quantifying the impact of memory utilization on task performance. This process enables the evaluation of agents' understanding of personalized knowledge in object rearrangement tasks by focusing on its role in goal interpretation: (1) the ability to identify target objects based on personal meaning (object semantics), and (2) the ability to infer object-location configurations from consistent user patterns, such as routines (user patterns). Our experiments across various LLMs reveal significant limitations in memory utilization, with even frontier models like GPT-4o experiencing a 30.5% performance drop when required to reference multiple memories, particularly in tasks involving user patterns. These findings, along with our detailed analyses and case studies, provide valuable insights for future research in developing more effective personalized embodied agents. Project website: https://connoriginal.github.io/MEMENTO

구현된 에이전트와 개인화의 만남: 개인 맞춤형 지원을 위한 메모리 활용 탐구

Embodied Agents Meet Personalization: Exploring Memory Utilization for Personalized Assistance

초록

Support