Quiet-STaR: 言語モデルは発話前に思考することを自己学習可能

要旨

人々が文章を書いたり話したりする際、時折考え込むことがある。推論に焦点を当てた研究では、推論を質問に答える手段やエージェント的なタスクを完了する方法として捉えることが多いが、推論はほぼ全ての文章に暗黙的に含まれている。例えば、証明の行間に書かれていないステップや、会話の基盤となる心の理論などがこれに該当する。Self-Taught Reasoner（STaR、Zelikman et al. 2022）では、質問応答における少数事例から根拠を推論し、正しい答えに導くものを学習することで、有用な思考が学ばれる。これは非常に制約の強い設定であり、理想的には、言語モデルが任意の文章から暗黙的な根拠を推論できるようになることが望ましい。本論文では、STaRを一般化したQuiet-STaRを提案する。Quiet-STaRでは、言語モデルが各トークンにおいて将来の文章を説明するための根拠を生成し、予測を改善する。我々は、1) 継続生成の計算コスト、2) 言語モデルが当初は内部思考を生成・利用する方法を知らないこと、3) 個々の次のトークンを超えて予測する必要性といった主要な課題に取り組む。これらを解決するため、トークンワイズ並列サンプリングアルゴリズム、思考の開始と終了を示す学習可能なトークン、および拡張されたteacher-forcing技術を提案する。励みになることに、生成された根拠は、特に予測が困難なトークンに対してモデルの性能を向上させ、難しい質問に直接答える能力を高める。特に、インターネットテキストのコーパスに対してQuiet-STaRを用いて言語モデルの事前学習を継続した結果、GSM8K（5.9%→10.9%）およびCommonsenseQA（36.3%→47.2%）においてゼロショット改善が見られ、自然文における困難なトークンのパープレキシティが改善された。重要な点として、これらの改善はタスク固有のファインチューニングを必要としない。Quiet-STaRは、より一般的でスケーラブルな方法で推論を学べる言語モデルへの一歩を記すものである。

English

When writing and talking, people sometimes pause to think. Although reasoning-focused works have often framed reasoning as a method of answering questions or completing agentic tasks, reasoning is implicit in almost all written text. For example, this applies to the steps not stated between the lines of a proof or to the theory of mind underlying a conversation. In the Self-Taught Reasoner (STaR, Zelikman et al. 2022), useful thinking is learned by inferring rationales from few-shot examples in question-answering and learning from those that lead to a correct answer. This is a highly constrained setting -- ideally, a language model could instead learn to infer unstated rationales in arbitrary text. We present Quiet-STaR, a generalization of STaR in which LMs learn to generate rationales at each token to explain future text, improving their predictions. We address key challenges, including 1) the computational cost of generating continuations, 2) the fact that the LM does not initially know how to generate or use internal thoughts, and 3) the need to predict beyond individual next tokens. To resolve these, we propose a tokenwise parallel sampling algorithm, using learnable tokens indicating a thought's start and end, and an extended teacher-forcing technique. Encouragingly, generated rationales disproportionately help model difficult-to-predict tokens and improve the LM's ability to directly answer difficult questions. In particular, after continued pretraining of an LM on a corpus of internet text with Quiet-STaR, we find zero-shot improvements on GSM8K (5.9%rightarrow10.9%) and CommonsenseQA (36.3%rightarrow47.2%) and observe a perplexity improvement of difficult tokens in natural text. Crucially, these improvements require no fine-tuning on these tasks. Quiet-STaR marks a step towards LMs that can learn to reason in a more general and scalable way.

Quiet-STaR: 言語モデルは発話前に思考することを自己学習可能

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

要旨

Support