ChatPaper.aiChatPaper

大型語言模型的補丁級訓練

Patch-Level Training for Large Language Models

July 17, 2024
作者: Chenze Shao, Fandong Meng, Jie Zhou
cs.AI

摘要

隨著大型語言模型(LLMs)在語言理解和生成方面取得顯著進展,其訓練效率已成為一個關鍵問題。傳統上,LLMs 被訓練來預測序列中的下一個標記。儘管標記級別的訓練取得了成功,但由於需要處理大量標記,它面臨著相當大的計算成本。為了緩解這個問題,本文引入了針對LLMs的補丁級別訓練,通過將多個標記壓縮為單個補丁來減少序列長度。在補丁級別訓練期間,我們將輸入語言模型較短的補丁序列,並訓練它來預測下一個補丁,從而以顯著降低的計算成本處理大部分訓練數據。隨後,模型將在剩餘的訓練數據上繼續進行標記級別訓練,以與推理模式保持一致。對各種模型(370M-2.7B參數)的實驗表明,補丁級別訓練可以將整體計算成本降低到0.5倍,而與標記級別訓練相比,並不會影響模型性能。原始碼:https://github.com/shaochenze/PatchTrain。
English
As Large Language Models (LLMs) achieve remarkable progress in language understanding and generation, their training efficiency has become a critical concern. Traditionally, LLMs are trained to predict the next token in a sequence. Despite the success of token-level training, it suffers from considerable computational costs due to the need to process an extensive number of tokens. To mitigate this issue, this paper introduces patch-level training for LLMs, which reduces the sequence length by compressing multiple tokens into a single patch. During patch-level training, we feed the language model shorter sequences of patches and train it to predict the next patch, thereby processing the majority of the training data at a significantly reduced computational cost. Following this, the model continues token-level training on the remaining training data to align with the inference mode. Experiments on a diverse range of models (370M-2.7B parameters) demonstrate that patch-level training can reduce overall computational costs to 0.5times, without compromising the model performance compared to token-level training. Source code: https://github.com/shaochenze/PatchTrain.

Summary

AI-Generated Summary

PDF173November 28, 2024