思考拡張事前学習

要旨

本論文では、思考軌跡を用いて既存のテキストデータを拡張することで、大規模言語モデル（LLM）の学習におけるデータ効率を向上させる、シンプルでスケーラブルなアプローチを紹介する。LLMの事前学習に必要な計算量は前例のない速度で増加している一方で、高品質なデータの利用可能性は限られている。そのため、利用可能なデータの効用を最大化することが重要な研究課題となっている。主な障壁は、固定されたモデル容量では、特定の高品質なトークンを学習することが難しい点である。これは、単一のトークンに対する根拠が非常に複雑で深い場合があるためである。この問題に対処するため、我々は「思考拡張型事前学習（Thinking augmented Pre-Training, TPT）」を提案する。これは、自動生成された思考軌跡を用いてテキストを拡張する普遍的な方法論であり、段階的な推論と分解を通じて高品質なトークンをより学習可能にする。TPTを100Bトークンまでの多様な学習設定に適用し、制約のあるデータと豊富なデータの両方を用いた事前学習、および強力なオープンソースのチェックポイントからの中期学習を含む。実験結果は、我々の手法が様々なモデルサイズとファミリーにおいてLLMの性能を大幅に向上させることを示している。特に、TPTはLLMの事前学習におけるデータ効率を3倍に向上させる。3Bパラメータのモデルでは、いくつかの難易度の高い推論ベンチマークにおいて、学習後の性能を10%以上向上させる。

English

This paper introduces a simple and scalable approach to improve the data efficiency of large language model (LLM) training by augmenting existing text data with thinking trajectories. The compute for pre-training LLMs has been growing at an unprecedented rate, while the availability of high-quality data remains limited. Consequently, maximizing the utility of available data constitutes a significant research challenge. A primary impediment is that certain high-quality tokens are difficult to learn given a fixed model capacity, as the underlying rationale for a single token can be exceptionally complex and deep. To address this issue, we propose Thinking augmented Pre-Training (TPT), a universal methodology that augments text with automatically generated thinking trajectories. Such augmentation effectively increases the volume of the training data and makes high-quality tokens more learnable through step-by-step reasoning and decomposition. We apply TPT across diverse training configurations up to 100B tokens, encompassing pre-training with both constrained and abundant data, as well as mid-training from strong open-source checkpoints. Experimental results indicate that our method substantially improves the performance of LLMs across various model sizes and families. Notably, TPT enhances the data efficiency of LLM pre-training by a factor of 3. For a 3B parameter model, it improves the post-training performance by over 10% on several challenging reasoning benchmarks.

思考拡張事前学習

Thinking Augmented Pre-training

要旨

Support