思考增強式預訓練

摘要

本文提出了一種簡單且可擴展的方法，通過在現有文本數據中增加思維軌跡來提升大型語言模型（LLM）訓練的數據效率。LLM預訓練所需的計算資源正以空前的速度增長，而高質量數據的可用性卻依然有限。因此，如何最大化利用現有數據成為了一項重要的研究挑戰。一個主要障礙在於，在模型容量固定的情況下，某些高質量詞元（token）難以學習，因為單個詞元背後的邏輯可能異常複雜且深奧。為解決這一問題，我們提出了思維增強預訓練（Thinking augmented Pre-Training, TPT），這是一種通用方法，通過自動生成的思維軌跡來增強文本數據。這種增強有效地擴大了訓練數據的規模，並通過逐步推理和分解使高質量詞元更易於學習。我們在多達100B詞元的各種訓練配置中應用TPT，包括數據受限和數據充足的預訓練，以及從強力開源檢查點開始的中期訓練。實驗結果表明，我們的方法顯著提升了不同規模和系列LLM的性能。值得注意的是，TPT將LLM預訓練的數據效率提高了3倍。對於一個3B參數的模型，它在多個具有挑戰性的推理基準測試中，使訓練後性能提升了超過10%。

English

This paper introduces a simple and scalable approach to improve the data efficiency of large language model (LLM) training by augmenting existing text data with thinking trajectories. The compute for pre-training LLMs has been growing at an unprecedented rate, while the availability of high-quality data remains limited. Consequently, maximizing the utility of available data constitutes a significant research challenge. A primary impediment is that certain high-quality tokens are difficult to learn given a fixed model capacity, as the underlying rationale for a single token can be exceptionally complex and deep. To address this issue, we propose Thinking augmented Pre-Training (TPT), a universal methodology that augments text with automatically generated thinking trajectories. Such augmentation effectively increases the volume of the training data and makes high-quality tokens more learnable through step-by-step reasoning and decomposition. We apply TPT across diverse training configurations up to 100B tokens, encompassing pre-training with both constrained and abundant data, as well as mid-training from strong open-source checkpoints. Experimental results indicate that our method substantially improves the performance of LLMs across various model sizes and families. Notably, TPT enhances the data efficiency of LLM pre-training by a factor of 3. For a 3B parameter model, it improves the post-training performance by over 10% on several challenging reasoning benchmarks.

思考增強式預訓練

Thinking Augmented Pre-training

摘要

Support