思考增強式預訓練
Thinking Augmented Pre-training
September 24, 2025
作者: Liang Wang, Nan Yang, Shaohan Huang, Li Dong, Furu Wei
cs.AI
摘要
本文提出了一種簡單且可擴展的方法,通過在現有文本數據中增加思維軌跡來提升大型語言模型(LLM)訓練的數據效率。LLM預訓練所需的計算資源正以空前的速度增長,而高質量數據的可用性卻依然有限。因此,如何最大化利用現有數據成為了一項重要的研究挑戰。一個主要障礙在於,在模型容量固定的情況下,某些高質量詞元(token)難以學習,因為單個詞元背後的邏輯可能異常複雜且深奧。為解決這一問題,我們提出了思維增強預訓練(Thinking augmented Pre-Training, TPT),這是一種通用方法,通過自動生成的思維軌跡來增強文本數據。這種增強有效地擴大了訓練數據的規模,並通過逐步推理和分解使高質量詞元更易於學習。我們在多達100B詞元的各種訓練配置中應用TPT,包括數據受限和數據充足的預訓練,以及從強力開源檢查點開始的中期訓練。實驗結果表明,我們的方法顯著提升了不同規模和系列LLM的性能。值得注意的是,TPT將LLM預訓練的數據效率提高了3倍。對於一個3B參數的模型,它在多個具有挑戰性的推理基準測試中,使訓練後性能提升了超過10%。
English
This paper introduces a simple and scalable approach to improve the data
efficiency of large language model (LLM) training by augmenting existing text
data with thinking trajectories. The compute for pre-training LLMs has been
growing at an unprecedented rate, while the availability of high-quality data
remains limited. Consequently, maximizing the utility of available data
constitutes a significant research challenge. A primary impediment is that
certain high-quality tokens are difficult to learn given a fixed model
capacity, as the underlying rationale for a single token can be exceptionally
complex and deep. To address this issue, we propose Thinking augmented
Pre-Training (TPT), a universal methodology that augments text with
automatically generated thinking trajectories. Such augmentation effectively
increases the volume of the training data and makes high-quality tokens more
learnable through step-by-step reasoning and decomposition. We apply TPT across
diverse training configurations up to 100B tokens, encompassing pre-training
with both constrained and abundant data, as well as mid-training from strong
open-source checkpoints. Experimental results indicate that our method
substantially improves the performance of LLMs across various model sizes and
families. Notably, TPT enhances the data efficiency of LLM pre-training by a
factor of 3. For a 3B parameter model, it improves the post-training
performance by over 10% on several challenging reasoning benchmarks.