從潛在思維中推理學習

摘要

語言模型（LM）預訓練的計算規模增長速度已超過人類書寫文本的增長，這引發了數據將成為LM規模擴展瓶頸的擔憂。為了在這種數據受限的情況下繼續推進預訓練，我們提出，通過顯式建模和推斷文本生成過程背後的潛在思維，可以顯著提高預訓練的數據效率。直觀上，我們的方法將網絡文本視為冗長人類思維過程的壓縮最終產物，而這些潛在思維包含了對數據高效學習至關重要的上下文知識和推理步驟。我們通過數學領域的數據受限持續預訓練，實證展示了我們方法的有效性。首先，我們展示了推斷潛在思維的合成數據方法顯著提升了數據效率，其表現優於相同數量原始數據的訓練（在MATH數據集上從5.7%提升至25.4%）。此外，我們展示了在沒有強教師模型的情況下進行潛在思維推斷，其中LM通過使用EM算法迭代提升訓練LM的能力及思維增強預訓練數據的質量，從而自舉其性能。我們證明，一個10億參數的LM能夠在至少三次迭代中自舉其性能，並顯著超越基於原始數據訓練的基線模型，且在執行E步驟時，隨著推理計算的增加，收益也隨之增加。推理規模擴展和EM迭代帶來的收益，為數據受限的預訓練規模擴展提供了新的機遇。

English

Compute scaling for language model (LM) pretraining has outpaced the growth of human-written texts, leading to concerns that data will become the bottleneck to LM scaling. To continue scaling pretraining in this data-constrained regime, we propose that explicitly modeling and inferring the latent thoughts that underlie the text generation process can significantly improve pretraining data efficiency. Intuitively, our approach views web text as the compressed final outcome of a verbose human thought process and that the latent thoughts contain important contextual knowledge and reasoning steps that are critical to data-efficient learning. We empirically demonstrate the effectiveness of our approach through data-constrained continued pretraining for math. We first show that synthetic data approaches to inferring latent thoughts significantly improve data efficiency, outperforming training on the same amount of raw data (5.7\% rightarrow 25.4\% on MATH). Furthermore, we demonstrate latent thought inference without a strong teacher, where an LM bootstraps its own performance by using an EM algorithm to iteratively improve the capability of the trained LM and the quality of thought-augmented pretraining data. We show that a 1B LM can bootstrap its performance across at least three iterations and significantly outperform baselines trained on raw data, with increasing gains from additional inference compute when performing the E-step. The gains from inference scaling and EM iterations suggest new opportunities for scaling data-constrained pretraining.

從潛在思維中推理學習

Reasoning to Learn from Latent Thoughts

摘要

Support