留意差距:彌合思維躍遷以改進思維鏈微調
Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning
May 20, 2025
作者: Haolei Xu, Yuchen Yan, Yongliang Shen, Wenqi Zhang, Guiyang Hou, Shengpei Jiang, Kaitao Song, Weiming Lu, Jun Xiao, Yueting Zhuang
cs.AI
摘要
大型語言模型(LLMs)通過思維鏈(CoT)推理在數學任務上取得了顯著進展。然而,現有的數學CoT數據集常因專家省略中間步驟而出現思維跳躍,這對模型的學習和泛化產生了負面影響。我們提出了CoT思維跳躍橋接任務,旨在自動檢測跳躍並生成缺失的中間推理步驟,以恢復CoT的完整性和連貫性。為此,我們基於結構化的ScaleQuestMath數據集構建了一個專門的訓練數據集ScaleQM+,並訓練了CoT-Bridge來橋接思維跳躍。通過在數學推理基準上的全面實驗,我們證明,在橋接數據集上微調的模型始終優於在原始數據集上訓練的模型,在NuminaMath上提升了高達+5.87%。我們的方法有效增強了蒸餾數據(+3.02%),並為強化學習提供了更好的起點(+3.1%),作為一個即插即用的模塊,與現有的優化技術兼容。此外,CoT-Bridge在跨領域邏輯推理任務上表現出更好的泛化能力,證實了提升推理完整性能帶來廣泛適用的益處。
English
Large language models (LLMs) have achieved remarkable progress on
mathematical tasks through Chain-of-Thought (CoT) reasoning. However, existing
mathematical CoT datasets often suffer from Thought Leaps due to experts
omitting intermediate steps, which negatively impacts model learning and
generalization. We propose the CoT Thought Leap Bridge Task, which aims to
automatically detect leaps and generate missing intermediate reasoning steps to
restore the completeness and coherence of CoT. To facilitate this, we
constructed a specialized training dataset called ScaleQM+, based on the
structured ScaleQuestMath dataset, and trained CoT-Bridge to bridge thought
leaps. Through comprehensive experiments on mathematical reasoning benchmarks,
we demonstrate that models fine-tuned on bridged datasets consistently
outperform those trained on original datasets, with improvements of up to
+5.87% on NuminaMath. Our approach effectively enhances distilled data (+3.02%)
and provides better starting points for reinforcement learning (+3.1%),
functioning as a plug-and-play module compatible with existing optimization
techniques. Furthermore, CoT-Bridge demonstrate improved generalization to
out-of-domain logical reasoning tasks, confirming that enhancing reasoning
completeness yields broadly applicable benefits.Summary
AI-Generated Summary