迈向LLMs中的系统2推理:学习如何用元认知链思考
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
January 8, 2025
作者: Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken, Nick Haber, Chelsea Finn
cs.AI
摘要
我们提出了一个新颖的框架,元思维链(Meta-CoT),它通过明确建模到达特定思维链所需的基本推理来扩展传统的思维链(CoT)。我们提供了来自最先进模型的经验证据,展示了表现出与上下文搜索一致的行为,并探讨通过过程监督、合成数据生成和搜索算法产生Meta-CoT的方法。最后,我们概述了一个具体的流程,用于训练模型生成Meta-CoT,其中包括使用线性化搜索轨迹的指导调整和训练后的强化学习。最后,我们讨论了一些开放性研究问题,包括扩展规律、验证者角色以及发现新型推理算法的潜力。这项工作提供了一个理论和实践路线图,以实现LLMs中的Meta-CoT,为人工智能中更强大和更类人推理铺平道路。
English
We propose a novel framework, Meta Chain-of-Thought (Meta-CoT), which extends
traditional Chain-of-Thought (CoT) by explicitly modeling the underlying
reasoning required to arrive at a particular CoT. We present empirical evidence
from state-of-the-art models exhibiting behaviors consistent with in-context
search, and explore methods for producing Meta-CoT via process supervision,
synthetic data generation, and search algorithms. Finally, we outline a
concrete pipeline for training a model to produce Meta-CoTs, incorporating
instruction tuning with linearized search traces and reinforcement learning
post-training. Finally, we discuss open research questions, including scaling
laws, verifier roles, and the potential for discovering novel reasoning
algorithms. This work provides a theoretical and practical roadmap to enable
Meta-CoT in LLMs, paving the way for more powerful and human-like reasoning in
artificial intelligence.Summary
AI-Generated Summary