重新思考预训练中的反思机制
Rethinking Reflection in Pre-Training
April 5, 2025
作者: Essential AI, Darsh J Shah, Peter Rushton, Somanshu Singla, Mohit Parmar, Kurt Smith, Yash Vanjani, Ashish Vaswani, Adarsh Chaluvaraju, Andrew Hojel, Andrew Ma, Anil Thomas, Anthony Polloreno, Ashish Tanwer, Burhan Drak Sibai, Divya S Mansingka, Divya Shivaprasad, Ishaan Shah, Karl Stratos, Khoi Nguyen, Michael Callahan, Michael Pust, Mrinal Iyer, Philip Monk, Platon Mazarakis, Ritvik Kapila, Saurabh Srivastava, Tim Romanski
cs.AI
摘要
語言模型反思自身推理的能力,為解決複雜問題提供了關鍵優勢。儘管近期研究多聚焦於此能力在強化學習過程中的發展,我們的研究表明,這種能力實際上在模型預訓練階段便已開始萌芽。為探究此現象,我們在思維鏈中刻意引入錯誤,測試模型能否通過識別並修正這些錯誤,最終得出正確答案。通過追蹤預訓練不同階段的表現,我們觀察到這種自我修正能力早期即已顯現,並隨時間穩步提升。例如,一個在4萬億詞元上預訓練的OLMo2-7B模型,在我們設計的六項自我反思任務中展現了自我修正的能力。
English
A language model's ability to reflect on its own reasoning provides a key
advantage for solving complex problems. While most recent research has focused
on how this ability develops during reinforcement learning, we show that it
actually begins to emerge much earlier - during the model's pre-training. To
study this, we introduce deliberate errors into chains-of-thought and test
whether the model can still arrive at the correct answer by recognizing and
correcting these mistakes. By tracking performance across different stages of
pre-training, we observe that this self-correcting ability appears early and
improves steadily over time. For instance, an OLMo2-7B model pre-trained on 4
trillion tokens displays self-correction on our six self-reflection tasks.Summary
AI-Generated Summary