Graniteコードモデルのコンテキスト長を128Kにスケーリング

要旨

本論文では、最大128Kトークンの有効なコンテキストウィンドウをサポートする長文脈Graniteコードモデルを紹介する。Granite 3B/8Bコードモデルのコンテキスト長を2K/4Kから128Kにスケーリングするための我々のソリューションは、リポジトリレベルのファイルパッキングと長文脈データの長さアップサンプリングを用いて、RoPEベース周波数を徐々に増加させる軽量な継続事前学習から成る。さらに、長文脈をサポートする指示チューニングモデルも公開しており、これは長文脈ベースモデルを、許諾ライセンスの短文脈と長文脈の指示-応答ペアの混合データでさらにファインチューニングしたものである。元の短文脈Graniteコードモデルと比較すると、我々の長文脈モデルは、通常のコード補完ベンチマーク（例：HumanEval）でのパフォーマンス低下をほとんど見せずに、長文脈タスクで大幅な改善を達成している。すべての長文脈Graniteコードモデルは、研究および商用利用のためのApache 2.0ライセンスの下で公開している。

English

This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraining by gradually increasing its RoPE base frequency with repository-level file packing and length-upsampled long-context data. Additionally, we also release instruction-tuned models with long-context support which are derived by further finetuning the long context base models on a mix of permissively licensed short and long-context instruction-response pairs. While comparing to the original short-context Granite code models, our long-context models achieve significant improvements on long-context tasks without any noticeable performance degradation on regular code completion benchmarks (e.g., HumanEval). We release all our long-context Granite code models under an Apache 2.0 license for both research and commercial use.

Graniteコードモデルのコンテキスト長を128Kにスケーリング

Scaling Granite Code Models to 128K Context

要旨

Support