思想分子结构:长链思维推理的拓扑映射
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning
January 9, 2026
作者: Qiguang Chen, Yantao Du, Ziniu Li, Jinhao Liu, Songyao Duan, Jiarui Guo, Minghao Liu, Jiaheng Liu, Tong Yang, Ge Zhang, Libo Qin, Wanxiang Che, Wenhao Huang
cs.AI
摘要
大型语言模型(LLMs)往往难以通过模仿人类或非长链思维(Long CoT)的LLMs来习得有效的长链推理能力。为探究这一问题,我们提出:在统一视角下,有效且可学习的长链思维轨迹具有类似分子结构的稳定性,这种结构由三种相互作用类型构成——深度推理(类共价键作用)、自我反思(类氢键作用)和自主探索(类范德华力作用)。对蒸馏轨迹的分析表明,这些结构源于长链思维微调过程,而非关键词模仿。我们引入"有效语义异构体"概念,证明仅当化学键能促进快速熵收敛时,才能支撑稳定的长链思维学习,而结构竞争会损害训练效果。基于这些发现,我们提出Mole-Syn方法——一种基于分布转移图的引导策略,能够指导有效长链思维结构的合成,在多项基准测试中显著提升模型性能与强化学习稳定性。
English
Large language models (LLMs) often fail to learn effective long chain-of-thought (Long CoT) reasoning from human or non-Long-CoT LLMs imitation. To understand this, we propose that effective and learnable Long CoT trajectories feature stable molecular-like structures in unified view, which are formed by three interaction types: Deep-Reasoning (covalent-like), Self-Reflection (hydrogen-bond-like), and Self-Exploration (van der Waals-like). Analysis of distilled trajectories reveals these structures emerge from Long CoT fine-tuning, not keyword imitation. We introduce Effective Semantic Isomers and show that only bonds promoting fast entropy convergence support stable Long CoT learning, while structural competition impairs training. Drawing on these findings, we present Mole-Syn, a distribution-transfer-graph method that guides synthesis of effective Long CoT structures, boosting performance and RL stability across benchmarks.