セッション間パーソナライズされたツール呼び出しのための潜在的嗜好モデリング

要旨

ユーザーはLLMベースのエージェントに対する要求において、ツール利用に必須の詳細情報を省略することが多く、結果としてツール使用時の入力が不完全に指定される。これはツール拡張エージェントにとって根本的な課題である。なぜなら、API実行には通常、完全な引数が必要であり、パーソナライズされたツール呼び出しの必要性が浮き彫りになるためである。この問題を研究するため、我々は265のマルチセッション対話から構成されるベンチマークMPTを導入する。これは、選好の想起（Preference Recall）、選好の帰納（Preference Induction）、選好の転移（Preference Transfer）という3つの課題をカバーする。さらに、ユーザーの選好を進化する仮説として表現する、テスト時記憶拡張手法PRefineを提案する。これは、生成・検証・洗練のループを通じて、過去の対話履歴から再利用可能な制約を抽出し、フル履歴プロンプティングに必要とされるトークンのわずか1.24%のみを使用しながら、ツール呼び出しの精度を向上させる。これらの結果は、エージェントシステムにおける堅牢なパーソナライゼーションには、ユーザーの選択結果だけでなく、その背後にある理由を捕捉するメモリが重要であることを示唆している。

English

Users often omit essential details in their requests to LLM-based agents, resulting in under-specified inputs for tool use. This poses a fundamental challenge for tool-augmented agents, as API execution typically requires complete arguments, highlighting the need for personalized tool calling. To study this problem, we introduce MPT, a benchmark comprising 265 multi-session dialogues that cover three challenges: Preference Recall, Preference Induction, and Preference Transfer. We also propose PRefine, a test-time memory-augmented method that represents user preferences as evolving hypotheses. Through a generate--verify--refine loop, it extracts reusable constraints from history and improves tool-calling accuracy while using only 1.24% of the tokens required by full-history prompting. These results indicate that robust personalization in agentic systems depends on memory that captures the reasons behind user choices, not just the choices themselves.

セッション間パーソナライズされたツール呼び出しのための潜在的嗜好モデリング

Latent Preference Modeling for Cross-Session Personalized Tool Calling

要旨

Support