一般的な推論能力を獲得するには、最初から推論を学ぶ必要がある

要旨

大規模言語モデル（LLM）は、現実世界での有用性を示し、人工的有用知能（AUI）の一例となっています。しかし、適応的かつ頑健に推論する能力——人工汎用知能（AGI）の特徴——はまだ脆弱です。LLMは常識推論、プログラミング、数学において成功しているように見えますが、新しい文脈でのアルゴリズム的理解を一般化するのに苦労しています。私たちの難解なプログラミング言語を用いたアルゴリズムタスクの実験では、LLMの推論が訓練データに過剰適合し、転移性が限られていることが明らかになりました。私たちは、このような限られた転移性の根本的な問題は、LLMにおける推論と知識の結合にあると仮説を立てています。 AUIからAGIへの移行を実現するために、私たちは知識と推論を分離するための3つの主要な方向性を提案します：(1) 広く使われている次のトークン予測の事前学習に代わるものとして、ゼロから強化学習（RL）を用いて推論を事前学習すること、(2) 合成タスクのカリキュラムを使用して、自然言語タスクに転移可能な推論の事前知識を容易に学習すること、(3) トークン間の偽の相関を利用することを減らすために、小さなコンテキストウィンドウを使用してより一般化可能な推論関数を学習すること。このような推論システムを、訓練された検索システムと大規模な外部メモリバンクを知識ストアとして組み合わせることで、新しいシナリオでの推論学習における既存のアーキテクチャのいくつかの限界を克服できると考えています。

English

Large Language Models (LLMs) have demonstrated impressive real-world utility, exemplifying artificial useful intelligence (AUI). However, their ability to reason adaptively and robustly -- the hallmarks of artificial general intelligence (AGI) -- remains fragile. While LLMs seemingly succeed in commonsense reasoning, programming, and mathematics, they struggle to generalize algorithmic understanding across novel contexts. Our experiments with algorithmic tasks in esoteric programming languages reveal that LLM's reasoning overfits to the training data and is limited in its transferability. We hypothesize that the core issue underlying such limited transferability is the coupling of reasoning and knowledge in LLMs. To transition from AUI to AGI, we propose disentangling knowledge and reasoning through three key directions: (1) pretaining to reason using RL from scratch as an alternative to the widely used next-token prediction pretraining, (2) using a curriculum of synthetic tasks to ease the learning of a reasoning prior for RL that can then be transferred to natural language tasks, and (3) learning more generalizable reasoning functions using a small context window to reduce exploiting spurious correlations between tokens. Such a reasoning system coupled with a trained retrieval system and a large external memory bank as a knowledge store can overcome several limitations of existing architectures at learning to reason in novel scenarios.

一般的な推論能力を獲得するには、最初から推論を学ぶ必要がある

General Reasoning Requires Learning to Reason from the Get-go

要旨

Support