大規模ツール利用によるユーザー志向型マルチターン対話生成

要旨

大規模推論モデル（LRM）を自律エージェントとして活用する最近のパラダイムシフトにより、高度なマルチターンツール利用能力への需要が高まっています。しかし、既存のデータセットやデータ生成手法は、静的に定義されたツールセットに制限されており、オープンエンドな人間とエージェントの協働の複雑性に対応できません。この課題に対処するため、我々は当初、LRMベースのシミュレータを活用し、特定のタスクを解決するための高価値なドメイン特化ツールを動的に生成する、自動化されたタスク指向型マルチターン対話生成フレームワークを開発しました。しかし、純粋にタスク指向の設計では、エージェントが最小限の相互作用で目的を達成する「単なる課題解決」軌道が生じやすく、現実的なシナリオで見られるような高ターン数対話が生成されないことが観察されました。この隔たりを埋めるため、我々はユーザー指向のシミュレーションパラダイムへ転換しました。タスク生成を、漸次的な要求提示やターン毎のフィードバックといった人間の行動ルールを模倣する専用ユーザーシミュレータから分離することで、現実世界の問題解決の反復的性質を反映した、より真正性の高い長尺マルチターン対話を実現します。当社の生成パイプラインは、任意の状態から生成を開始可能な汎用的なプラグアンドプレイモジュールとして機能し、拡張されたツール利用データの生産において高い拡張性を保証します。さらに、単一軌道内で複数のタスク完遂を可能にすることで、現実世界の人間とエージェントの相互作用が持つ多面的な要求を反映した高密度データセットを生成します。

English

The recent paradigm shift toward large reasoning models (LRMs) as autonomous agents has intensified the demand for sophisticated, multi-turn tool-use capabilities. Yet, existing datasets and data-generation approaches are limited by static, predefined toolsets that cannot scale to the complexity of open-ended human-agent collaboration. To address this, we initially developed a framework for automated task-oriented multi-turn dialogue generation at scale, utilizing an LRM-based simulator to dynamically generate high-value, domain-specific tools to solve specified tasks. However, we observe that a purely task-oriented design often results in "solely task-solving" trajectories, where the agent completes the objective with minimal interaction, failing to generate the high turn-count conversations seen in realistic scenarios. To bridge this gap, we shift toward a user-oriented simulation paradigm. By decoupling task generation from a dedicated user simulator that mimics human behavioral rules - such as incremental request-making and turn-by-turn feedback - we facilitate more authentic, extended multi-turn dialogues that reflect the iterative nature of real-world problem solving. Our generation pipeline operates as a versatile, plug-and-play module capable of initiating generation from any state, ensuring high scalability in producing extended tool-use data. Furthermore, by facilitating multiple task completions within a single trajectory, it yields a high-density dataset that reflects the multifaceted demands of real-world human-agent interaction.

大規模ツール利用によるユーザー志向型マルチターン対話生成

User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale

要旨

Support