D-CORE:激励大型推理模型中的任务分解以实现复杂工具使用
D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use
February 2, 2026
作者: Bowen Xu, Shaoyu Wu, Hao Jiang, Kai Liu, Xin Chen, Lulu Hu, Bin Yang
cs.AI
摘要
有效工具使用与推理能力是大型推理模型应对复杂现实问题的核心能力。通过实证分析,我们发现当前大型推理模型在复杂工具使用场景中缺乏子任务分解能力,导致出现惰性推理现象。为此,我们提出两阶段训练框架D-CORE(任务分解与推理过程组合),首先通过自蒸馏机制激发模型的任務分解推理能力,随后采用多样性感知强化学习恢复模型的反思性推理能力。D-CORE在不同基准测试和模型规模下均实现了稳健的工具使用能力提升。在BFCLv3基准上的实验表明:D-CORE-8B模型达到77.7%的准确率,较最优8B模型提升5.7%;D-CORE-14B更以79.3%的准确率创下新纪录,在体积仅为1/5的情况下超越70B级模型。源代码已发布于https://github.com/alibaba/EfficientAI。
English
Effective tool use and reasoning are essential capabilities for large reasoning models~(LRMs) to address complex real-world problems. Through empirical analysis, we identify that current LRMs lack the capability of sub-task decomposition in complex tool use scenarios, leading to Lazy Reasoning. To address this, we propose a two-stage training framework D-CORE~(\textbf{D}ecomposing tasks and \textbf{Co}mposing \textbf{Re}asoning processes) that first incentivize the LRMs' task decomposition reasoning capability via self-distillation, followed by diversity-aware reinforcement learning~(RL) to restore LRMs' reflective reasoning capability. D-CORE achieves robust tool-use improvements across diverse benchmarks and model scales. Experiments on BFCLv3 demonstrate superiority of our method: D-CORE-8B reaches 77.7\% accuracy, surpassing the best-performing 8B model by 5.7\%. Meanwhile, D-CORE-14B establishes a new state-of-the-art at 79.3\%, outperforming 70B models despite being 5times smaller. The source code is available at https://github.com/alibaba/EfficientAI.