将 Llama-3 的上下文扩展十倍。

摘要

我们通过QLoRA微调，将Llama-3-8B-Instruct的上下文长度从8K扩展到80K。整个训练周期非常高效，仅需在一台8xA800（80G）GPU机器上花费8小时。结果模型在各种评估任务中表现出色，如NIHS、主题检索和长上下文语言理解；同时，它也很好地保留了对短上下文的原始能力。这种显著的上下文扩展主要归因于由GPT-4生成的仅3.5K合成训练样本，这表明LLMs具有扩展其原始上下文长度的潜力（尽管这一潜力在很大程度上被低估）。事实上，通过更多的计算资源，上下文长度可以进一步扩展到80K之外。因此，团队将公开发布所有资源（包括数据、模型、数据生成流水线、训练代码），以促进社区未来的研究：https://github.com/FlagOpen/FlagEmbedding。

English

We extend the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA fine-tuning. The entire training cycle is super efficient, which takes 8 hours on one 8xA800 (80G) GPU machine. The resulted model exhibits superior performances across a broad range of evaluation tasks, such as NIHS, topic retrieval, and long-context language understanding; meanwhile, it also well preserves the original capability over short contexts. The dramatic context extension is mainly attributed to merely 3.5K synthetic training samples generated by GPT-4 , which indicates the LLMs' inherent (yet largely underestimated) potential to extend its original context length. In fact, the context length could be extended far beyond 80K with more computation resources. Therefore, the team will publicly release the entire resources (including data, model, data generation pipeline, training code) so as to facilitate the future research from the community: https://github.com/FlagOpen/FlagEmbedding.

将 Llama-3 的上下文扩展十倍。

Extending Llama-3's Context Ten-Fold Overnight

摘要

Summary

Support

Support