Quiet-STaR：語言模型可以自我教導在說話之前思考

摘要

在書寫和交談時，人們有時會停下來思考。儘管以推理為重點的作品通常將推理框定為回答問題或完成代理任務的方法，但推理幾乎隱含在所有書面文本中。例如，這適用於證明中未明確陳述的步驟，或者適用於支撐對話背後的心智理論。在《自學推理者》（STaR，Zelikman等人，2022）中，通過從少量示例中推斷出合理性來學習有用的思考，從而導致正確答案的學習。這是一個高度受限制的環境--理想情況下，語言模型可以學會推斷任意文本中未明示的合理性。我們提出了Quiet-STaR，這是STaR的一種泛化，其中語言模型學會在每個標記處生成合理性，以解釋未來的文本，從而改善其預測。我們解決了一些關鍵挑戰，包括：1）生成延續的計算成本，2）語言模型最初不知道如何生成或使用內部思維，以及3）需要預測超出單個下一個標記。為了解決這些問題，我們提出了一種標記平行抽樣算法，使用可學習的標記來指示思維的開始和結束，以及擴展的教師強迫技術。令人鼓舞的是，生成的合理性不成比例地幫助模型預測困難的標記，並提高語言模型直接回答困難問題的能力。特別是，在持續對語言模型在互聯網文本語料庫上進行Quiet-STaR的預訓練後，我們發現對GSM8K（5.9%到10.9%）和CommonsenseQA（36.3%到47.2%）的零樣本改進，並觀察到自然文本中困難標記的困惑度改進。至關重要的是，這些改進不需要對這些任務進行微調。Quiet-STaR標誌著語言模型能夠以更一般和可擴展的方式學習推理的一步。

English

When writing and talking, people sometimes pause to think. Although reasoning-focused works have often framed reasoning as a method of answering questions or completing agentic tasks, reasoning is implicit in almost all written text. For example, this applies to the steps not stated between the lines of a proof or to the theory of mind underlying a conversation. In the Self-Taught Reasoner (STaR, Zelikman et al. 2022), useful thinking is learned by inferring rationales from few-shot examples in question-answering and learning from those that lead to a correct answer. This is a highly constrained setting -- ideally, a language model could instead learn to infer unstated rationales in arbitrary text. We present Quiet-STaR, a generalization of STaR in which LMs learn to generate rationales at each token to explain future text, improving their predictions. We address key challenges, including 1) the computational cost of generating continuations, 2) the fact that the LM does not initially know how to generate or use internal thoughts, and 3) the need to predict beyond individual next tokens. To resolve these, we propose a tokenwise parallel sampling algorithm, using learnable tokens indicating a thought's start and end, and an extended teacher-forcing technique. Encouragingly, generated rationales disproportionately help model difficult-to-predict tokens and improve the LM's ability to directly answer difficult questions. In particular, after continued pretraining of an LM on a corpus of internet text with Quiet-STaR, we find zero-shot improvements on GSM8K (5.9%rightarrow10.9%) and CommonsenseQA (36.3%rightarrow47.2%) and observe a perplexity improvement of difficult tokens in natural text. Crucially, these improvements require no fine-tuning on these tasks. Quiet-STaR marks a step towards LMs that can learn to reason in a more general and scalable way.

Quiet-STaR：語言模型可以自我教導在說話之前思考

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

摘要

Support