Llama-3의 컨텍스트를 하룻밤 사이에 열 배로 확장하기

초록

QLoRA 미세 조정을 통해 Llama-3-8B-Instruct의 컨텍스트 길이를 8K에서 80K로 확장했습니다. 전체 학습 주기는 매우 효율적이며, 8xA800(80G) GPU 머신 한 대에서 8시간이 소요됩니다. 결과 모델은 NIHS, 주제 검색, 장문 컨텍스트 언어 이해 등 다양한 평가 작업에서 우수한 성능을 보여주며, 동시에 짧은 컨텍스트에 대한 원래의 능력도 잘 유지합니다. 이러한 극적인 컨텍스트 확장은 주로 GPT-4로 생성된 3.5K개의 합성 학습 샘플 덕분이며, 이는 LLM의 원래 컨텍스트 길이를 확장할 수 있는 내재적(그러나 크게 과소평가된) 잠재력을 시사합니다. 사실, 더 많은 계산 자원을 투입하면 컨텍스트 길이를 80K 이상으로도 확장할 수 있습니다. 따라서 팀은 향후 커뮤니티의 연구를 촉진하기 위해 전체 리소스(데이터, 모델, 데이터 생성 파이프라인, 학습 코드 포함)를 공개할 예정입니다: https://github.com/FlagOpen/FlagEmbedding.

English

We extend the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA fine-tuning. The entire training cycle is super efficient, which takes 8 hours on one 8xA800 (80G) GPU machine. The resulted model exhibits superior performances across a broad range of evaluation tasks, such as NIHS, topic retrieval, and long-context language understanding; meanwhile, it also well preserves the original capability over short contexts. The dramatic context extension is mainly attributed to merely 3.5K synthetic training samples generated by GPT-4 , which indicates the LLMs' inherent (yet largely underestimated) potential to extend its original context length. In fact, the context length could be extended far beyond 80K with more computation resources. Therefore, the team will publicly release the entire resources (including data, model, data generation pipeline, training code) so as to facilitate the future research from the community: https://github.com/FlagOpen/FlagEmbedding.

Llama-3의 컨텍스트를 하룻밤 사이에 열 배로 확장하기

Extending Llama-3's Context Ten-Fold Overnight

초록

Support