留意差距：彌合思維躍遷以改進思維鏈微調

摘要

大型語言模型（LLMs）通過思維鏈（CoT）推理在數學任務上取得了顯著進展。然而，現有的數學CoT數據集常因專家省略中間步驟而出現思維跳躍，這對模型的學習和泛化產生了負面影響。我們提出了CoT思維跳躍橋接任務，旨在自動檢測跳躍並生成缺失的中間推理步驟，以恢復CoT的完整性和連貫性。為此，我們基於結構化的ScaleQuestMath數據集構建了一個專門的訓練數據集ScaleQM+，並訓練了CoT-Bridge來橋接思維跳躍。通過在數學推理基準上的全面實驗，我們證明，在橋接數據集上微調的模型始終優於在原始數據集上訓練的模型，在NuminaMath上提升了高達+5.87%。我們的方法有效增強了蒸餾數據（+3.02%），並為強化學習提供了更好的起點（+3.1%），作為一個即插即用的模塊，與現有的優化技術兼容。此外，CoT-Bridge在跨領域邏輯推理任務上表現出更好的泛化能力，證實了提升推理完整性能帶來廣泛適用的益處。

English

Large language models (LLMs) have achieved remarkable progress on mathematical tasks through Chain-of-Thought (CoT) reasoning. However, existing mathematical CoT datasets often suffer from Thought Leaps due to experts omitting intermediate steps, which negatively impacts model learning and generalization. We propose the CoT Thought Leap Bridge Task, which aims to automatically detect leaps and generate missing intermediate reasoning steps to restore the completeness and coherence of CoT. To facilitate this, we constructed a specialized training dataset called ScaleQM+, based on the structured ScaleQuestMath dataset, and trained CoT-Bridge to bridge thought leaps. Through comprehensive experiments on mathematical reasoning benchmarks, we demonstrate that models fine-tuned on bridged datasets consistently outperform those trained on original datasets, with improvements of up to +5.87% on NuminaMath. Our approach effectively enhances distilled data (+3.02%) and provides better starting points for reinforcement learning (+3.1%), functioning as a plug-and-play module compatible with existing optimization techniques. Furthermore, CoT-Bridge demonstrate improved generalization to out-of-domain logical reasoning tasks, confirming that enhancing reasoning completeness yields broadly applicable benefits.

留意差距：彌合思維躍遷以改進思維鏈微調

Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning

摘要

Support