将花岗岩代码模型扩展到128K上下文

摘要

本文介绍了支持长上下文窗口长达128K标记的Granite代码模型。我们针对将Granite 3B/8B代码模型的上下文长度从2K/4K扩展到128K的解决方案包括通过逐渐增加RoPE基础频率进行轻量级持续预训练，以及使用存储库级文件打包和长度上采样的长上下文数据。此外，我们还发布了针对长上下文支持的经过指令调整的模型，这些模型是通过在允许许可的短和长上下文指令-响应对上进一步微调长上下文基础模型得出的。与原始短上下文Granite代码模型相比，我们的长上下文模型在长上下文任务上取得了显著改进，在常规代码补全基准测试（例如HumanEval）上没有明显的性能下降。我们根据Apache 2.0许可发布了所有长上下文Granite代码模型，供研究和商业使用。

English

This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraining by gradually increasing its RoPE base frequency with repository-level file packing and length-upsampled long-context data. Additionally, we also release instruction-tuned models with long-context support which are derived by further finetuning the long context base models on a mix of permissively licensed short and long-context instruction-response pairs. While comparing to the original short-context Granite code models, our long-context models achieve significant improvements on long-context tasks without any noticeable performance degradation on regular code completion benchmarks (e.g., HumanEval). We release all our long-context Granite code models under an Apache 2.0 license for both research and commercial use.

将花岗岩代码模型扩展到128K上下文

Scaling Granite Code Models to 128K Context

摘要

Support