將 Llama-3 的上下文擴展十倍於一夜之間

摘要

我們通過 QLoRA fine-tuning，將 Llama-3-8B-Instruct 的上下文長度從 8K 擴展到 80K。整個訓練週期非常高效，在一台 8xA800（80G）GPU 機器上僅需 8 小時。結果模型在各種評估任務中表現優異，包括 NIHS、主題檢索和長篇上下文語言理解；同時，它也很好地保留了對短篇上下文的原始能力。這種戲劇性的上下文擴展主要歸因於由 GPT-4 生成的僅 3.5K 合成訓練樣本，這表明了大型語言模型本身（儘管在很大程度上被低估）擴展其原始上下文長度的潛力。事實上，隨著更多計算資源，上下文長度可以擴展到遠超過 80K。因此，團隊將公開發布所有資源（包括數據、模型、數據生成管道、訓練代碼），以促進社區未來的研究：https://github.com/FlagOpen/FlagEmbedding。

English

We extend the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA fine-tuning. The entire training cycle is super efficient, which takes 8 hours on one 8xA800 (80G) GPU machine. The resulted model exhibits superior performances across a broad range of evaluation tasks, such as NIHS, topic retrieval, and long-context language understanding; meanwhile, it also well preserves the original capability over short contexts. The dramatic context extension is mainly attributed to merely 3.5K synthetic training samples generated by GPT-4 , which indicates the LLMs' inherent (yet largely underestimated) potential to extend its original context length. In fact, the context length could be extended far beyond 80K with more computation resources. Therefore, the team will publicly release the entire resources (including data, model, data generation pipeline, training code) so as to facilitate the future research from the community: https://github.com/FlagOpen/FlagEmbedding.

將 Llama-3 的上下文擴展十倍於一夜之間

Extending Llama-3's Context Ten-Fold Overnight

摘要

Support