ChatPaper.aiChatPaper

大型语言模型的补丁级训练

Patch-Level Training for Large Language Models

July 17, 2024
作者: Chenze Shao, Fandong Meng, Jie Zhou
cs.AI

摘要

随着大型语言模型(LLMs)在语言理解和生成方面取得显著进展,它们的训练效率已成为一个关键问题。传统上,LLMs 被训练以预测序列中的下一个标记。尽管标记级别训练取得了成功,但由于需要处理大量标记,它面临着相当大的计算成本。为了缓解这一问题,本文引入了面向补丁级别的LLMs训练,通过将多个标记压缩成单个补丁来减少序列长度。在补丁级别训练期间,我们向语言模型提供更短的补丁序列,并训练它来预测下一个补丁,从而以显著降低的计算成本处理大部分训练数据。随后,模型在剩余的训练数据上继续进行标记级别训练,以与推理模式保持一致。在各种模型(370M-2.7B 参数)上的实验表明,与标记级别训练相比,补丁级别训练可以将整体计算成本降低到0.5倍,而不会影响模型性能。源代码:https://github.com/shaochenze/PatchTrain。
English
As Large Language Models (LLMs) achieve remarkable progress in language understanding and generation, their training efficiency has become a critical concern. Traditionally, LLMs are trained to predict the next token in a sequence. Despite the success of token-level training, it suffers from considerable computational costs due to the need to process an extensive number of tokens. To mitigate this issue, this paper introduces patch-level training for LLMs, which reduces the sequence length by compressing multiple tokens into a single patch. During patch-level training, we feed the language model shorter sequences of patches and train it to predict the next patch, thereby processing the majority of the training data at a significantly reduced computational cost. Following this, the model continues token-level training on the remaining training data to align with the inference mode. Experiments on a diverse range of models (370M-2.7B parameters) demonstrate that patch-level training can reduce overall computational costs to 0.5times, without compromising the model performance compared to token-level training. Source code: https://github.com/shaochenze/PatchTrain.

Summary

AI-Generated Summary

PDF173November 28, 2024