將Granite代碼模型擴展至128K上下文

摘要

本文介紹了長上下文 Granite 程式碼模型，支援高達 128K 個標記的有效上下文窗口。我們對 Granite 3B/8B 程式碼模型的上下文長度進行擴展，從 2K/4K 擴展到 128K 的解決方案包括輕量級持續預訓練，逐步增加其 RoPE 基頻率，並使用存儲庫級檔案打包和長上下文數據進行長度上採樣。此外，我們還釋出了支援長上下文的指令調整模型，通過進一步在許可權寬鬆的短和長上下文指令-回應對上對長上下文基礎模型進行微調而得到。與原始短上下文 Granite 程式碼模型相比，我們的長上下文模型在長上下文任務上取得了顯著改進，而在常規程式碼完成基準測試（例如 HumanEval）上並未觀察到性能下降。我們釋出所有長上下文 Granite 程式碼模型，採用 Apache 2.0 許可證，供研究和商業用途使用。

English

This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraining by gradually increasing its RoPE base frequency with repository-level file packing and length-upsampled long-context data. Additionally, we also release instruction-tuned models with long-context support which are derived by further finetuning the long context base models on a mix of permissively licensed short and long-context instruction-response pairs. While comparing to the original short-context Granite code models, our long-context models achieve significant improvements on long-context tasks without any noticeable performance degradation on regular code completion benchmarks (e.g., HumanEval). We release all our long-context Granite code models under an Apache 2.0 license for both research and commercial use.

將Granite代碼模型擴展至128K上下文

Scaling Granite Code Models to 128K Context

摘要

Support