跨会话个性化工具调用的隐式偏好建模

摘要

用户在使用基于大语言模型的智能体时，常会遗漏关键细节，导致工具调用面临输入信息不完整的挑战。这对工具增强型智能体构成了根本性难题——API执行通常需要完整参数，这凸显出个性化工具调用的必要性。为研究此问题，我们提出MPT基准数据集，包含265个多轮对话，涵盖偏好记忆、偏好归纳和偏好迁移三大挑战。同时我们开发了PRefine方法，这种基于测试时记忆增强的技术将用户偏好建模为动态假设，通过生成-验证-精炼的循环从历史对话中提取可复用的约束条件，仅需完整历史提示1.24%的token量即可提升工具调用准确率。实验表明，智能体系统的强健个性化依赖于能捕捉用户选择背后逻辑的记忆机制，而非仅记录选择结果。

English

Users often omit essential details in their requests to LLM-based agents, resulting in under-specified inputs for tool use. This poses a fundamental challenge for tool-augmented agents, as API execution typically requires complete arguments, highlighting the need for personalized tool calling. To study this problem, we introduce MPT, a benchmark comprising 265 multi-session dialogues that cover three challenges: Preference Recall, Preference Induction, and Preference Transfer. We also propose PRefine, a test-time memory-augmented method that represents user preferences as evolving hypotheses. Through a generate--verify--refine loop, it extracts reusable constraints from history and improves tool-calling accuracy while using only 1.24% of the tokens required by full-history prompting. These results indicate that robust personalization in agentic systems depends on memory that captures the reasons behind user choices, not just the choices themselves.

跨会话个性化工具调用的隐式偏好建模

Latent Preference Modeling for Cross-Session Personalized Tool Calling

摘要

Support