D-CORE:激励大型推理模型中的任务分解以实现复杂工具使用
D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use
February 2, 2026
作者: Bowen Xu, Shaoyu Wu, Hao Jiang, Kai Liu, Xin Chen, Lulu Hu, Bin Yang
cs.AI
摘要
高效工具使用与推理能力是大型推理模型解决复杂现实问题的核心。通过实证分析,我们发现当前模型在复杂工具使用场景中缺乏子任务分解能力,导致出现"惰性推理"现象。为此,我们提出两阶段训练框架D-CORE(任务解构与推理流程组合),首先通过自蒸馏技术激发模型的子任务分解推理能力,随后采用多样性感知强化学习恢复其反思性推理能力。D-CORE在不同基准测试和模型规模下均实现了稳健的工具使用提升。BFCLv3实验表明:D-CORE-8B模型达到77.7%准确率,较最佳8B模型提升5.7%;D-CORE-14B更以79.3%准确率刷新纪录,在体积仅为1/5的情况下超越70B模型。源代码已发布于https://github.com/alibaba/EfficientAI。
English
Effective tool use and reasoning are essential capabilities for large reasoning models~(LRMs) to address complex real-world problems. Through empirical analysis, we identify that current LRMs lack the capability of sub-task decomposition in complex tool use scenarios, leading to Lazy Reasoning. To address this, we propose a two-stage training framework D-CORE~(\textbf{D}ecomposing tasks and \textbf{Co}mposing \textbf{Re}asoning processes) that first incentivize the LRMs' task decomposition reasoning capability via self-distillation, followed by diversity-aware reinforcement learning~(RL) to restore LRMs' reflective reasoning capability. D-CORE achieves robust tool-use improvements across diverse benchmarks and model scales. Experiments on BFCLv3 demonstrate superiority of our method: D-CORE-8B reaches 77.7\% accuracy, surpassing the best-performing 8B model by 5.7\%. Meanwhile, D-CORE-14B establishes a new state-of-the-art at 79.3\%, outperforming 70B models despite being 5times smaller. The source code is available at https://github.com/alibaba/EfficientAI.