大規模工具使用下的使用者導向多輪對話生成
User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale
January 13, 2026
作者: Jungho Cho, Minbyul Jeong, Sungrae Park
cs.AI
摘要
近期大型推理模型(LRM)作為自主代理的典範轉移,加劇了對複雜多輪工具使用能力的需求。然而現有數據集和數據生成方法受限於靜態預定義工具集,難以擴展至開放式人機協作的複雜場景。為解決此問題,我們開發了基於LRM模擬器的自動化任務導向多輪對話生成框架,通過動態生成高價值的領域專用工具來解決指定任務。但我們觀察到純任務導向設計容易產生「唯任務解決」軌跡,即代理以最小交互完成目標,未能生成真實場景中常見的高輪次對話。為彌合這一差距,我們轉向用戶導向的模擬範式:通過將任務生成與專用用戶模擬器解耦,模擬人類行為規則(如漸進式請求提出與逐輪反饋),我們促成了更能反映現實世界問題解決迭代特質的真實延展型多輪對話。我們的生成流水線作為可插拔的通用模塊,能從任意狀態啟動生成,確保在生產擴展型工具使用數據時具備高擴展性。此外,通過在單一軌跡內實現多重任務完成,該框架能產出反映真實人機交互多維需求的高密度數據集。
English
The recent paradigm shift toward large reasoning models (LRMs) as autonomous agents has intensified the demand for sophisticated, multi-turn tool-use capabilities. Yet, existing datasets and data-generation approaches are limited by static, predefined toolsets that cannot scale to the complexity of open-ended human-agent collaboration. To address this, we initially developed a framework for automated task-oriented multi-turn dialogue generation at scale, utilizing an LRM-based simulator to dynamically generate high-value, domain-specific tools to solve specified tasks. However, we observe that a purely task-oriented design often results in "solely task-solving" trajectories, where the agent completes the objective with minimal interaction, failing to generate the high turn-count conversations seen in realistic scenarios. To bridge this gap, we shift toward a user-oriented simulation paradigm. By decoupling task generation from a dedicated user simulator that mimics human behavioral rules - such as incremental request-making and turn-by-turn feedback - we facilitate more authentic, extended multi-turn dialogues that reflect the iterative nature of real-world problem solving. Our generation pipeline operates as a versatile, plug-and-play module capable of initiating generation from any state, ensuring high scalability in producing extended tool-use data. Furthermore, by facilitating multiple task completions within a single trajectory, it yields a high-density dataset that reflects the multifaceted demands of real-world human-agent interaction.