將 Llama-3 的上下文擴展十倍於一夜之間
Extending Llama-3's Context Ten-Fold Overnight
April 30, 2024
作者: Peitian Zhang, Ninglu Shao, Zheng Liu, Shitao Xiao, Hongjin Qian, Qiwei Ye, Zhicheng Dou
cs.AI
摘要
我們通過 QLoRA fine-tuning,將 Llama-3-8B-Instruct 的上下文長度從 8K 擴展到 80K。整個訓練週期非常高效,在一台 8xA800(80G)GPU 機器上僅需 8 小時。結果模型在各種評估任務中表現優異,包括 NIHS、主題檢索和長篇上下文語言理解;同時,它也很好地保留了對短篇上下文的原始能力。這種戲劇性的上下文擴展主要歸因於由 GPT-4 生成的僅 3.5K 合成訓練樣本,這表明了大型語言模型本身(儘管在很大程度上被低估)擴展其原始上下文長度的潛力。事實上,隨著更多計算資源,上下文長度可以擴展到遠超過 80K。因此,團隊將公開發布所有資源(包括數據、模型、數據生成管道、訓練代碼),以促進社區未來的研究:https://github.com/FlagOpen/FlagEmbedding。
English
We extend the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA
fine-tuning. The entire training cycle is super efficient, which takes 8 hours
on one 8xA800 (80G) GPU machine. The resulted model exhibits superior
performances across a broad range of evaluation tasks, such as NIHS, topic
retrieval, and long-context language understanding; meanwhile, it also well
preserves the original capability over short contexts. The dramatic context
extension is mainly attributed to merely 3.5K synthetic training samples
generated by GPT-4 , which indicates the LLMs' inherent (yet largely
underestimated) potential to extend its original context length. In fact, the
context length could be extended far beyond 80K with more computation
resources. Therefore, the team will publicly release the entire resources
(including data, model, data generation pipeline, training code) so as to
facilitate the future research from the community:
https://github.com/FlagOpen/FlagEmbedding.Summary
AI-Generated Summary