将花岗岩代码模型扩展到128K上下文
Scaling Granite Code Models to 128K Context
July 18, 2024
作者: Matt Stallone, Vaibhav Saxena, Leonid Karlinsky, Bridget McGinn, Tim Bula, Mayank Mishra, Adriana Meza Soria, Gaoyuan Zhang, Aditya Prasad, Yikang Shen, Saptha Surendran, Shanmukha Guttula, Hima Patel, Parameswaran Selvam, Xuan-Hong Dang, Yan Koyfman, Atin Sood, Rogerio Feris, Nirmit Desai, David D. Cox, Ruchir Puri, Rameswar Panda
cs.AI
摘要
本文介绍了支持长上下文窗口长达128K标记的Granite代码模型。我们针对将Granite 3B/8B代码模型的上下文长度从2K/4K扩展到128K的解决方案包括通过逐渐增加RoPE基础频率进行轻量级持续预训练,以及使用存储库级文件打包和长度上采样的长上下文数据。此外,我们还发布了针对长上下文支持的经过指令调整的模型,这些模型是通过在允许许可的短和长上下文指令-响应对上进一步微调长上下文基础模型得出的。与原始短上下文Granite代码模型相比,我们的长上下文模型在长上下文任务上取得了显著改进,在常规代码补全基准测试(例如HumanEval)上没有明显的性能下降。我们根据Apache 2.0许可发布了所有长上下文Granite代码模型,供研究和商业使用。
English
This paper introduces long-context Granite code models that support effective
context windows of up to 128K tokens. Our solution for scaling context length
of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight
continual pretraining by gradually increasing its RoPE base frequency with
repository-level file packing and length-upsampled long-context data.
Additionally, we also release instruction-tuned models with long-context
support which are derived by further finetuning the long context base models on
a mix of permissively licensed short and long-context instruction-response
pairs. While comparing to the original short-context Granite code models, our
long-context models achieve significant improvements on long-context tasks
without any noticeable performance degradation on regular code completion
benchmarks (e.g., HumanEval). We release all our long-context Granite code
models under an Apache 2.0 license for both research and commercial use.Summary
AI-Generated Summary