面向用户的大规模多轮对话生成与工具使用
User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale
January 13, 2026
作者: Jungho Cho, Minbyul Jeong, Sungrae Park
cs.AI
摘要
近期,大型推理模型(LRM)向自主智能体的范式转变,对复杂多轮工具使用能力的需求日益迫切。然而,现有数据集和数据生成方法受限于静态预定义工具集,难以适应开放域人机协作的复杂性。为此,我们开发了基于LRM模拟器的自动化任务导向型多轮对话生成框架,通过动态生成高价值的领域专用工具来解决指定任务。但我们发现,纯粹任务导向的设计容易产生"单一任务解决"轨迹——智能体以最简交互完成目标,无法复现现实场景中常见的高轮次对话。为弥补这一差距,我们转向用户导向的模拟范式:通过将任务生成与模拟人类行为规则(如渐进式请求生成和逐轮反馈)的专用用户模拟器解耦,我们构建出更能体现现实世界问题迭代求解本质的、更真实的长程多轮对话。该生成管线作为可即插即用的通用模块,能从任意状态启动生成,确保扩展工具使用数据生产的高扩展性。此外,通过支持单轨迹内多任务完成,它可生成反映现实人机交互多维需求的高密度数据集。
English
The recent paradigm shift toward large reasoning models (LRMs) as autonomous agents has intensified the demand for sophisticated, multi-turn tool-use capabilities. Yet, existing datasets and data-generation approaches are limited by static, predefined toolsets that cannot scale to the complexity of open-ended human-agent collaboration. To address this, we initially developed a framework for automated task-oriented multi-turn dialogue generation at scale, utilizing an LRM-based simulator to dynamically generate high-value, domain-specific tools to solve specified tasks. However, we observe that a purely task-oriented design often results in "solely task-solving" trajectories, where the agent completes the objective with minimal interaction, failing to generate the high turn-count conversations seen in realistic scenarios. To bridge this gap, we shift toward a user-oriented simulation paradigm. By decoupling task generation from a dedicated user simulator that mimics human behavioral rules - such as incremental request-making and turn-by-turn feedback - we facilitate more authentic, extended multi-turn dialogues that reflect the iterative nature of real-world problem solving. Our generation pipeline operates as a versatile, plug-and-play module capable of initiating generation from any state, ensuring high scalability in producing extended tool-use data. Furthermore, by facilitating multiple task completions within a single trajectory, it yields a high-density dataset that reflects the multifaceted demands of real-world human-agent interaction.