Quiet-STaR：语言模型可以自我教导在言语之前进行思考

摘要

在写作和交谈时，人们有时会停下来思考。尽管以推理为重点的作品通常将推理框定为回答问题或完成任务的方法，但推理几乎隐含在所有书面文本中。例如，这适用于证明中未明确说明的步骤，或者支撑对话的心智理论。在《自学推理者》（STaR，Zelikman等，2022）中，通过从少样本示例中推断出导致正确答案的原因，学习有用的思考。这是一个高度受限的设置--理想情况下，语言模型可以学会推断任意文本中未明示的原因。我们提出了Quiet-STaR，这是STaR的推广，其中语言模型学会在每个标记处生成原因，以解释未来的文本，从而提高其预测能力。我们解决了关键挑战，包括：1）生成延续的计算成本，2）LM最初不知道如何生成或使用内部思想，以及3）需要预测超出单个下一个标记。为了解决这些问题，我们提出了一种逐标记并行抽样算法，使用可学习的标记来指示思想的开始和结束，并采用了扩展的教师强制技术。令人鼓舞的是，生成的原因不成比例地帮助模型预测困难的标记，并提高LM直接回答困难问题的能力。特别是，在将LM持续预训练于一组互联网文本语料库后，我们发现在GSM8K（5.9%到10.9%）和CommonsenseQA（36.3%到47.2%）上实现了零样本改进，并观察到自然文本中困难标记的困惑度改进。至关重要的是，这些改进不需要在这些任务上进行微调。Quiet-STaR标志着LM能够以更一般和可扩展的方式学会推理的一步。

English

When writing and talking, people sometimes pause to think. Although reasoning-focused works have often framed reasoning as a method of answering questions or completing agentic tasks, reasoning is implicit in almost all written text. For example, this applies to the steps not stated between the lines of a proof or to the theory of mind underlying a conversation. In the Self-Taught Reasoner (STaR, Zelikman et al. 2022), useful thinking is learned by inferring rationales from few-shot examples in question-answering and learning from those that lead to a correct answer. This is a highly constrained setting -- ideally, a language model could instead learn to infer unstated rationales in arbitrary text. We present Quiet-STaR, a generalization of STaR in which LMs learn to generate rationales at each token to explain future text, improving their predictions. We address key challenges, including 1) the computational cost of generating continuations, 2) the fact that the LM does not initially know how to generate or use internal thoughts, and 3) the need to predict beyond individual next tokens. To resolve these, we propose a tokenwise parallel sampling algorithm, using learnable tokens indicating a thought's start and end, and an extended teacher-forcing technique. Encouragingly, generated rationales disproportionately help model difficult-to-predict tokens and improve the LM's ability to directly answer difficult questions. In particular, after continued pretraining of an LM on a corpus of internet text with Quiet-STaR, we find zero-shot improvements on GSM8K (5.9%rightarrow10.9%) and CommonsenseQA (36.3%rightarrow47.2%) and observe a perplexity improvement of difficult tokens in natural text. Crucially, these improvements require no fine-tuning on these tasks. Quiet-STaR marks a step towards LMs that can learn to reason in a more general and scalable way.

Quiet-STaR：语言模型可以自我教导在言语之前进行思考

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

摘要

Support