Llama-3のコンテキストを一晩で10倍に拡張

要旨

QLoRAファインチューニングにより、Llama-3-8B-Instructのコンテキスト長を8Kから80Kに拡張しました。トレーニング全体は非常に効率的で、8xA800（80G）GPUマシン1台で8時間しかかかりませんでした。結果として得られたモデルは、NIHS、トピック検索、長文コンテキスト理解など、幅広い評価タスクで優れた性能を示しています。同時に、短いコンテキストに対する元の能力も十分に保持しています。この劇的なコンテキスト拡張は、主にGPT-4によって生成されたわずか3.5Kの合成トレーニングサンプルによるものであり、LLMが元のコンテキスト長を拡張するための内在的（しかし大きく過小評価されている）可能性を示しています。実際、より多くの計算リソースがあれば、コンテキスト長は80Kをはるかに超えて拡張できる可能性があります。そのため、チームは今後のコミュニティの研究を促進するために、データ、モデル、データ生成パイプライン、トレーニングコードを含むすべてのリソースを公開する予定です： https://github.com/FlagOpen/FlagEmbedding。

English

We extend the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA fine-tuning. The entire training cycle is super efficient, which takes 8 hours on one 8xA800 (80G) GPU machine. The resulted model exhibits superior performances across a broad range of evaluation tasks, such as NIHS, topic retrieval, and long-context language understanding; meanwhile, it also well preserves the original capability over short contexts. The dramatic context extension is mainly attributed to merely 3.5K synthetic training samples generated by GPT-4 , which indicates the LLMs' inherent (yet largely underestimated) potential to extend its original context length. In fact, the context length could be extended far beyond 80K with more computation resources. Therefore, the team will publicly release the entire resources (including data, model, data generation pipeline, training code) so as to facilitate the future research from the community: https://github.com/FlagOpen/FlagEmbedding.

Llama-3のコンテキストを一晩で10倍に拡張

Extending Llama-3's Context Ten-Fold Overnight

要旨

Support