思想分子結構:長鏈思維推理的拓撲圖譜
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning
January 9, 2026
作者: Qiguang Chen, Yantao Du, Ziniu Li, Jinhao Liu, Songyao Duan, Jiarui Guo, Minghao Liu, Jiaheng Liu, Tong Yang, Ge Zhang, Libo Qin, Wanxiang Che, Wenhao Huang
cs.AI
摘要
大型語言模型(LLMs)往往難以通過模仿人類或非長鏈思維(Long CoT)的LLMs來習得有效的長鏈推理能力。為理解此現象,我們提出:在統一視角下,有效且可習得的長鏈思維軌跡具有類似分子結構的穩定性,該結構由三種交互類型構成——深度推理(類共價鍵)、自我反思(類氫鍵)與自我探索(類范德華力)。對蒸餾軌跡的分析表明,這些結構源於長鏈思維微調過程,而非關鍵詞模仿。我們引入「有效語義異構體」概念,並證明僅有能促進快速熵收斂的鍵結才能支持穩定的長鏈思維學習,而結構競爭會損害訓練效果。基於這些發現,我們提出Mole-Syn方法,這是一種透過分佈轉移圖引導有效長鏈思維結構合成的技術,在多項基準測試中顯著提升模型性能與強化學習穩定性。
English
Large language models (LLMs) often fail to learn effective long chain-of-thought (Long CoT) reasoning from human or non-Long-CoT LLMs imitation. To understand this, we propose that effective and learnable Long CoT trajectories feature stable molecular-like structures in unified view, which are formed by three interaction types: Deep-Reasoning (covalent-like), Self-Reflection (hydrogen-bond-like), and Self-Exploration (van der Waals-like). Analysis of distilled trajectories reveals these structures emerge from Long CoT fine-tuning, not keyword imitation. We introduce Effective Semantic Isomers and show that only bonds promoting fast entropy convergence support stable Long CoT learning, while structural competition impairs training. Drawing on these findings, we present Mole-Syn, a distribution-transfer-graph method that guides synthesis of effective Long CoT structures, boosting performance and RL stability across benchmarks.