ChatPaper.aiChatPaper

长食谱:大型语言模型中高效长上下文泛化的食谱

LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models

August 31, 2024
作者: Zhiyuan Hu, Yuliang Liu, Jinman Zhao, Suyuchen Wang, Yan Wang, Wei Shen, Qing Gu, Anh Tuan Luu, See-Kiong Ng, Zhiwei Jiang, Bryan Hooi
cs.AI

摘要

大型语言模型(LLMs)在处理长上下文任务时面临重大挑战,因为它们在预训练期间的有效上下文窗口大小有限,这限制了它们在延长序列上的泛化能力。同时,通过后续预训练来扩展LLMs中的上下文窗口是非常资源密集的。为了解决这个问题,我们引入了**LongRecipe**,这是一种用于扩展LLMs上下文窗口的高效训练策略,包括有影响力的标记分析、位置索引转换和训练优化策略。它模拟长序列输入,同时保持训练效率,并显著提高模型对长距离依赖关系的理解。对三种类型的LLMs进行的实验表明,LongRecipe可以利用长序列,同时只需目标上下文窗口大小的30%,并且与完整序列训练相比,减少了超过85%的计算训练资源。此外,LongRecipe还保留了原始LLM在一般任务中的能力。最终,*我们可以将开源LLMs的有效上下文窗口从8k扩展到128k,在只使用一天的专用训练和单个具有80G内存的GPU的情况下,实现接近GPT-4的性能。*我们的代码已发布在[链接](https://github.com/zhiyuanhubj/LongRecipe)。
English
Large language models (LLMs) face significant challenges in handling long-context tasks because of their limited effective context window size during pretraining, which restricts their ability to generalize over extended sequences. Meanwhile, extending the context window in LLMs through post-pretraining is highly resource-intensive. To address this, we introduce **LongRecipe**, an efficient training strategy for extending the context window of LLMs, including impactful token analysis, position index transformation, and training optimization strategies. It simulates long-sequence inputs while maintaining training efficiency and significantly improves the model's understanding of long-range dependencies. Experiments on three types of LLMs show that LongRecipe can utilize long sequences while requiring only 30% of the target context window size, and reduces computational training resource over 85% compared to full sequence training. Furthermore, LongRecipe also preserves the original LLM's capabilities in general tasks. Ultimately, *we can extend the effective context window of open-source LLMs from 8k to 128k, achieving performance close to GPT-4 with just one day of dedicated training using a single GPU with 80G memory.* Our code is released at the [link](https://github.com/zhiyuanhubj/LongRecipe).

Summary

AI-Generated Summary

PDF422November 16, 2024