跨会话个性化工具调用的潜在偏好建模

摘要

用户在与基于大语言模型的智能体交互时，常会遗漏关键细节，导致工具调用时出现输入信息不完整的问题。这对工具增强型智能体构成了根本性挑战——由于API执行通常需要完整参数，这凸显出个性化工具调用的必要性。为研究该问题，我们提出了包含265个多轮对话的MPT基准测试，涵盖偏好记忆、偏好归纳和偏好迁移三大挑战。同时我们开发了PRefine方法，这种基于测试时记忆增强的技术将用户偏好建模为动态假设，通过生成-验证-优化的循环从历史对话中提取可重用的约束条件，仅需完整历史提示1.24%的令牌量即可提升工具调用准确率。研究结果表明：智能体系统要实现强健的个性化服务，必须建立能够捕捉用户选择背后逻辑的记忆机制，而非仅记录选择结果。

English

Users often omit essential details in their requests to LLM-based agents, resulting in under-specified inputs for tool use. This poses a fundamental challenge for tool-augmented agents, as API execution typically requires complete arguments, highlighting the need for personalized tool calling. To study this problem, we introduce MPT, a benchmark comprising 265 multi-session dialogues that cover three challenges: Preference Recall, Preference Induction, and Preference Transfer. We also propose PRefine, a test-time memory-augmented method that represents user preferences as evolving hypotheses. Through a generate--verify--refine loop, it extracts reusable constraints from history and improves tool-calling accuracy while using only 1.24% of the tokens required by full-history prompting. These results indicate that robust personalization in agentic systems depends on memory that captures the reasons behind user choices, not just the choices themselves.

跨会话个性化工具调用的潜在偏好建模

Latent Preference Modeling for Cross-Session Personalized Tool Calling

摘要

Support