关注思维鸿沟:弥合思维跃迁以优化思维链调优
Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning
May 20, 2025
作者: Haolei Xu, Yuchen Yan, Yongliang Shen, Wenqi Zhang, Guiyang Hou, Shengpei Jiang, Kaitao Song, Weiming Lu, Jun Xiao, Yueting Zhuang
cs.AI
摘要
大型语言模型(LLMs)通过链式思维(CoT)推理在数学任务上取得了显著进展。然而,现有的数学CoT数据集常因专家省略中间步骤而出现思维跳跃,这对模型的学习与泛化能力产生了负面影响。我们提出了CoT思维跳跃桥接任务,旨在自动检测跳跃并生成缺失的中间推理步骤,以恢复CoT的完整性与连贯性。为此,我们基于结构化ScaleQuestMath数据集构建了专门的训练数据集ScaleQM+,并训练了CoT-Bridge来桥接思维跳跃。通过在数学推理基准上的全面实验,我们证明,在桥接后的数据集上微调的模型持续优于在原始数据集上训练的模型,在NuminaMath上提升高达+5.87%。我们的方法有效提升了蒸馏数据的效果(+3.02%),并为强化学习提供了更好的起点(+3.1%),作为一个即插即用的模块,与现有优化技术兼容。此外,CoT-Bridge在跨域逻辑推理任务上展现了更好的泛化能力,证实了增强推理完整性具有广泛的应用价值。
English
Large language models (LLMs) have achieved remarkable progress on
mathematical tasks through Chain-of-Thought (CoT) reasoning. However, existing
mathematical CoT datasets often suffer from Thought Leaps due to experts
omitting intermediate steps, which negatively impacts model learning and
generalization. We propose the CoT Thought Leap Bridge Task, which aims to
automatically detect leaps and generate missing intermediate reasoning steps to
restore the completeness and coherence of CoT. To facilitate this, we
constructed a specialized training dataset called ScaleQM+, based on the
structured ScaleQuestMath dataset, and trained CoT-Bridge to bridge thought
leaps. Through comprehensive experiments on mathematical reasoning benchmarks,
we demonstrate that models fine-tuned on bridged datasets consistently
outperform those trained on original datasets, with improvements of up to
+5.87% on NuminaMath. Our approach effectively enhances distilled data (+3.02%)
and provides better starting points for reinforcement learning (+3.1%),
functioning as a plug-and-play module compatible with existing optimization
techniques. Furthermore, CoT-Bridge demonstrate improved generalization to
out-of-domain logical reasoning tasks, confirming that enhancing reasoning
completeness yields broadly applicable benefits.Summary
AI-Generated Summary