일반적인 추론 능력은 처음부터 추론하는 법을 배우는 것을 필요로 한다.

초록

대규모 언어 모델(LLM)은 실제 세계에서 인상적인 유용성을 보여주며, 인공 유용 지능(AUI)의 전형을 보여주고 있습니다. 그러나 적응적이고 견고하게 사고하는 능력, 즉 인공 일반 지능(AGI)의 핵심 특징은 여전히 취약합니다. LLM은 상식적 추론, 프로그래밍, 수학 분야에서 성공적으로 보이지만, 새로운 맥락에서 알고리즘적 이해를 일반화하는 데 어려움을 겪습니다. 우리의 실험에서, 난해한 프로그래밍 언어로 된 알고리즘 과제를 통해 LLM의 추론이 훈련 데이터에 과적합되어 있고 전이 가능성이 제한적임을 확인했습니다. 우리는 이러한 제한된 전이 가능성의 근본적인 문제가 LLM에서 지식과 추론이 결합되어 있기 때문이라고 가정합니다. AUI에서 AGI로 전환하기 위해, 우리는 지식과 추론을 분리하는 세 가지 주요 방향을 제안합니다: (1) 널리 사용되는 다음 토큰 예측 사전 훈련 대신 처음부터 강화 학습(RL)을 사용하여 추론을 사전 훈련하는 것, (2) 합성 과제의 커리큘럼을 사용하여 자연어 과제로 전이할 수 있는 추론 사전을 학습하기 쉽게 하는 것, (3) 작은 컨텍스트 윈도우를 사용하여 토큰 간의 허위 상관 관계를 활용하는 것을 줄이고 더 일반화 가능한 추론 함수를 학습하는 것. 이러한 추론 시스템은 훈련된 검색 시스템과 대규모 외부 메모리 뱅크를 지식 저장소로 결합함으로써, 새로운 시나리오에서 추론을 학습하는 데 있어 기존 아키텍처의 여러 한계를 극복할 수 있습니다.

English

Large Language Models (LLMs) have demonstrated impressive real-world utility, exemplifying artificial useful intelligence (AUI). However, their ability to reason adaptively and robustly -- the hallmarks of artificial general intelligence (AGI) -- remains fragile. While LLMs seemingly succeed in commonsense reasoning, programming, and mathematics, they struggle to generalize algorithmic understanding across novel contexts. Our experiments with algorithmic tasks in esoteric programming languages reveal that LLM's reasoning overfits to the training data and is limited in its transferability. We hypothesize that the core issue underlying such limited transferability is the coupling of reasoning and knowledge in LLMs. To transition from AUI to AGI, we propose disentangling knowledge and reasoning through three key directions: (1) pretaining to reason using RL from scratch as an alternative to the widely used next-token prediction pretraining, (2) using a curriculum of synthetic tasks to ease the learning of a reasoning prior for RL that can then be transferred to natural language tasks, and (3) learning more generalizable reasoning functions using a small context window to reduce exploiting spurious correlations between tokens. Such a reasoning system coupled with a trained retrieval system and a large external memory bank as a knowledge store can overcome several limitations of existing architectures at learning to reason in novel scenarios.

일반적인 추론 능력은 처음부터 추론하는 법을 배우는 것을 필요로 한다.

General Reasoning Requires Learning to Reason from the Get-go

초록

Support