長食譜:在大型語言模型中實現高效長文本泛化的食譜
LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models
August 31, 2024
作者: Zhiyuan Hu, Yuliang Liu, Jinman Zhao, Suyuchen Wang, Yan Wang, Wei Shen, Qing Gu, Anh Tuan Luu, See-Kiong Ng, Zhiwei Jiang, Bryan Hooi
cs.AI
摘要
大型語言模型(LLMs)在處理長文本任務時面臨重大挑戰,因為它們在預訓練期間的有效上下文窗口大小有限,這限制了它們對延長序列的泛化能力。同時,通過後期預訓練來擴展LLMs中的上下文窗口是非常耗資源的。為了應對這一問題,我們引入了**LongRecipe**,這是一種有效的訓練策略,用於擴展LLMs的上下文窗口,包括有影響力的標記分析、位置索引轉換和訓練優化策略。它模擬長序列輸入,同時保持訓練效率,顯著提高模型對長距離依賴的理解。對三種類型的LLMs進行的實驗表明,LongRecipe能夠利用長序列,同時只需目標上下文窗口大小的30%,並且與完整序列訓練相比,減少了超過85%的計算訓練資源。此外,LongRecipe還保留了原始LLMs在一般任務中的能力。最終,*我們可以將開源LLMs的有效上下文窗口從8k擴展到128k,僅使用一個具有80G內存的單個GPU進行一天的專用訓練,即可實現接近GPT-4的性能。*我們的代碼已發布在[鏈接](https://github.com/zhiyuanhubj/LongRecipe)。
English
Large language models (LLMs) face significant challenges in handling
long-context tasks because of their limited effective context window size
during pretraining, which restricts their ability to generalize over extended
sequences. Meanwhile, extending the context window in LLMs through
post-pretraining is highly resource-intensive. To address this, we introduce
**LongRecipe**, an efficient training strategy for extending the context window
of LLMs, including impactful token analysis, position index transformation, and
training optimization strategies. It simulates long-sequence inputs while
maintaining training efficiency and significantly improves the model's
understanding of long-range dependencies. Experiments on three types of LLMs
show that LongRecipe can utilize long sequences while requiring only 30% of the
target context window size, and reduces computational training resource over
85% compared to full sequence training. Furthermore, LongRecipe also preserves
the original LLM's capabilities in general tasks. Ultimately, *we can extend
the effective context window of open-source LLMs from 8k to 128k, achieving
performance close to GPT-4 with just one day of dedicated training using a
single GPU with 80G memory.* Our code is released at the
[link](https://github.com/zhiyuanhubj/LongRecipe).Summary
AI-Generated Summary