通用推理能力需要从一开始就学习如何推理

摘要

大型語言模型（LLMs）已展現出令人印象深刻的實際應用價值，體現了人工實用智能（AUI）。然而，它們在適應性和魯棒性推理方面的能力——這些是人工通用智能（AGI）的標誌——仍然脆弱。儘管LLMs在常識推理、編程和數學方面似乎取得了成功，但它們在跨新情境泛化算法理解方面仍存在困難。我們在深奧編程語言中的算法任務實驗表明，LLM的推理過度擬合訓練數據，其可遷移性有限。我們假設，這種有限可遷移性的核心問題在於LLMs中推理與知識的耦合。為了從AUI過渡到AGI，我們提出通過三個關鍵方向來解耦知識與推理：（1）使用從零開始的強化學習（RL）進行推理預訓練，作為廣泛使用的下一詞預測預訓練的替代方案；（2）利用合成任務的課程來簡化RL推理先驗的學習，然後將其遷移到自然語言任務中；（3）使用小上下文窗口學習更具泛化性的推理函數，以減少對詞元間虛假相關性的利用。這樣一個推理系統，結合訓練好的檢索系統和作為知識存儲的大型外部記憶庫，能夠克服現有架構在學習新情境推理時的若干限制。

English

Large Language Models (LLMs) have demonstrated impressive real-world utility, exemplifying artificial useful intelligence (AUI). However, their ability to reason adaptively and robustly -- the hallmarks of artificial general intelligence (AGI) -- remains fragile. While LLMs seemingly succeed in commonsense reasoning, programming, and mathematics, they struggle to generalize algorithmic understanding across novel contexts. Our experiments with algorithmic tasks in esoteric programming languages reveal that LLM's reasoning overfits to the training data and is limited in its transferability. We hypothesize that the core issue underlying such limited transferability is the coupling of reasoning and knowledge in LLMs. To transition from AUI to AGI, we propose disentangling knowledge and reasoning through three key directions: (1) pretaining to reason using RL from scratch as an alternative to the widely used next-token prediction pretraining, (2) using a curriculum of synthetic tasks to ease the learning of a reasoning prior for RL that can then be transferred to natural language tasks, and (3) learning more generalizable reasoning functions using a small context window to reduce exploiting spurious correlations between tokens. Such a reasoning system coupled with a trained retrieval system and a large external memory bank as a knowledge store can overcome several limitations of existing architectures at learning to reason in novel scenarios.

通用推理能力需要从一开始就学习如何推理

General Reasoning Requires Learning to Reason from the Get-go

摘要

Support