ChatPaper.aiChatPaper

變壓器中的隱式推理是通過捷徑進行的推理

Implicit Reasoning in Transformers is Reasoning through Shortcuts

March 10, 2025
作者: Tianhe Lin, Jian Xie, Siyu Yuan, Deqing Yang
cs.AI

摘要

測試時計算正逐漸成為增強語言模型複雜多步推理能力的新範式,這一點在OpenAI的o1和o3以及DeepSeek的R1的成功中得到了體現。與測試時計算中的顯式推理相比,隱式推理在推理效率上更為高效,所需的生成標記更少。然而,為何這種先進的推理能力未能出現在隱式推理風格中?在本研究中,我們從頭開始訓練GPT-2,使用精心挑選的多步數學推理數據集,並進行分析性實驗,以探討語言模型在多步任務中如何執行隱式推理。我們的研究發現揭示:1)語言模型能夠通過隱式推理進行逐步推理,並在域內和域外測試中達到高準確率。然而,這種能力僅在訓練於固定模式數據時才會出現。2)相反,從非固定模式數據訓練中出現的隱式推理能力往往會過度擬合特定模式,無法進一步泛化。值得注意的是,這種限制在最先進的大型語言模型中也觀察到了。這些發現表明,語言模型通過捷徑學習獲得隱式推理能力,使其在具有相似模式的任務上表現出色,但缺乏泛化能力。
English
Test-time compute is emerging as a new paradigm for enhancing language models' complex multi-step reasoning capabilities, as demonstrated by the success of OpenAI's o1 and o3, as well as DeepSeek's R1. Compared to explicit reasoning in test-time compute, implicit reasoning is more inference-efficient, requiring fewer generated tokens. However, why does the advanced reasoning capability fail to emerge in the implicit reasoning style? In this work, we train GPT-2 from scratch on a curated multi-step mathematical reasoning dataset and conduct analytical experiments to investigate how language models perform implicit reasoning in multi-step tasks. Our findings reveal: 1) Language models can perform step-by-step reasoning and achieve high accuracy in both in-domain and out-of-domain tests via implicit reasoning. However, this capability only emerges when trained on fixed-pattern data. 2) Conversely, implicit reasoning abilities emerging from training on unfixed-pattern data tend to overfit a specific pattern and fail to generalize further. Notably, this limitation is also observed in state-of-the-art large language models. These findings suggest that language models acquire implicit reasoning through shortcut learning, enabling strong performance on tasks with similar patterns while lacking generalization.

Summary

AI-Generated Summary

PDF222March 12, 2025