Quiet-STaR: 언어 모델은 말하기 전에 스스로 생각하는 법을 배울 수 있다

초록

사람들은 글을 쓰거나 말할 때 때때로 생각하기 위해 멈춥니다. 추론 중심 연구들은 종종 추론을 질문에 답하거나 주도적인 작업을 완수하는 방법으로 정의해 왔지만, 추론은 사실상 모든 텍스트에 암묵적으로 존재합니다. 예를 들어, 증명의 줄 사이에 명시되지 않은 단계나 대화를 뒷받침하는 마음이론(theory of mind)이 이에 해당합니다. Self-Taught Reasoner(STaR, Zelikman et al. 2022)에서는 몇 가지 예시를 통해 질문에 대한 근거를 추론하고, 정답으로 이어지는 근거를 학습함으로써 유용한 사고를 배웁니다. 이는 매우 제한된 환경입니다. 이상적으로는 언어 모델이 임의의 텍스트에서 명시되지 않은 근거를 추론하는 법을 배울 수 있어야 합니다. 우리는 Quiet-STaR을 제안합니다. 이는 STaR의 일반화된 버전으로, 언어 모델이 각 토큰에서 미래 텍스트를 설명하기 위한 근거를 생성하며 예측을 개선합니다. 우리는 1) 텍스트 생성의 계산 비용, 2) 언어 모델이 초기에 내부 사고를 생성하거나 사용하는 방법을 모르는 점, 3) 개별 다음 토큰을 넘어 예측해야 할 필요성과 같은 주요 문제를 해결합니다. 이를 위해 토큰 단위 병렬 샘플링 알고리즘, 사고의 시작과 끝을 나타내는 학습 가능한 토큰, 그리고 확장된 교사 강요(teacher-forcing) 기법을 제안합니다. 고무적으로도, 생성된 근거는 예측하기 어려운 토큰을 설명하는 데 특히 도움이 되며, 언어 모델의 어려운 질문에 직접 답하는 능력을 향상시킵니다. 특히, 인터넷 텍스트 코퍼스에 Quiet-STaR을 적용해 언어 모델을 추가 사전 학습한 후, GSM8K(5.9%→10.9%)와 CommonsenseQA(36.3%→47.2%)에서 제로샷(zero-shot) 성능 향상을 확인했으며, 자연어 텍스트에서 어려운 토큰의 복잡도(perplexity)가 개선되었습니다. 중요한 점은 이러한 개선이 해당 작업에 대한 미세 조정(fine-tuning) 없이도 가능하다는 것입니다. Quiet-STaR은 언어 모델이 보다 일반적이고 확장 가능한 방식으로 추론하는 법을 배우는 데 한 걸음 더 나아간 것입니다.

English

When writing and talking, people sometimes pause to think. Although reasoning-focused works have often framed reasoning as a method of answering questions or completing agentic tasks, reasoning is implicit in almost all written text. For example, this applies to the steps not stated between the lines of a proof or to the theory of mind underlying a conversation. In the Self-Taught Reasoner (STaR, Zelikman et al. 2022), useful thinking is learned by inferring rationales from few-shot examples in question-answering and learning from those that lead to a correct answer. This is a highly constrained setting -- ideally, a language model could instead learn to infer unstated rationales in arbitrary text. We present Quiet-STaR, a generalization of STaR in which LMs learn to generate rationales at each token to explain future text, improving their predictions. We address key challenges, including 1) the computational cost of generating continuations, 2) the fact that the LM does not initially know how to generate or use internal thoughts, and 3) the need to predict beyond individual next tokens. To resolve these, we propose a tokenwise parallel sampling algorithm, using learnable tokens indicating a thought's start and end, and an extended teacher-forcing technique. Encouragingly, generated rationales disproportionately help model difficult-to-predict tokens and improve the LM's ability to directly answer difficult questions. In particular, after continued pretraining of an LM on a corpus of internet text with Quiet-STaR, we find zero-shot improvements on GSM8K (5.9%rightarrow10.9%) and CommonsenseQA (36.3%rightarrow47.2%) and observe a perplexity improvement of difficult tokens in natural text. Crucially, these improvements require no fine-tuning on these tasks. Quiet-STaR marks a step towards LMs that can learn to reason in a more general and scalable way.

Quiet-STaR: 언어 모델은 말하기 전에 스스로 생각하는 법을 배울 수 있다

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

초록

Support