跨会话个性化工具调用的潜在偏好建模
Latent Preference Modeling for Cross-Session Personalized Tool Calling
April 20, 2026
作者: Yejin Yoon, Minseo Kim, Taeuk Kim
cs.AI
摘要
用户在与基于大语言模型的智能体交互时,常会遗漏关键细节,导致工具调用时出现输入信息不完整的问题。这对工具增强型智能体构成了根本性挑战——由于API执行通常需要完整参数,这凸显出个性化工具调用的必要性。为研究该问题,我们提出了包含265个多轮对话的MPT基准测试,涵盖偏好记忆、偏好归纳和偏好迁移三大挑战。同时我们开发了PRefine方法,这种基于测试时记忆增强的技术将用户偏好建模为动态假设,通过生成-验证-优化的循环从历史对话中提取可重用的约束条件,仅需完整历史提示1.24%的令牌量即可提升工具调用准确率。研究结果表明:智能体系统要实现强健的个性化服务,必须建立能够捕捉用户选择背后逻辑的记忆机制,而非仅记录选择结果。
English
Users often omit essential details in their requests to LLM-based agents, resulting in under-specified inputs for tool use. This poses a fundamental challenge for tool-augmented agents, as API execution typically requires complete arguments, highlighting the need for personalized tool calling. To study this problem, we introduce MPT, a benchmark comprising 265 multi-session dialogues that cover three challenges: Preference Recall, Preference Induction, and Preference Transfer. We also propose PRefine, a test-time memory-augmented method that represents user preferences as evolving hypotheses. Through a generate--verify--refine loop, it extracts reusable constraints from history and improves tool-calling accuracy while using only 1.24% of the tokens required by full-history prompting. These results indicate that robust personalization in agentic systems depends on memory that captures the reasons behind user choices, not just the choices themselves.