ChatPaper.aiChatPaper

將Granite代碼模型擴展至128K上下文

Scaling Granite Code Models to 128K Context

July 18, 2024
作者: Matt Stallone, Vaibhav Saxena, Leonid Karlinsky, Bridget McGinn, Tim Bula, Mayank Mishra, Adriana Meza Soria, Gaoyuan Zhang, Aditya Prasad, Yikang Shen, Saptha Surendran, Shanmukha Guttula, Hima Patel, Parameswaran Selvam, Xuan-Hong Dang, Yan Koyfman, Atin Sood, Rogerio Feris, Nirmit Desai, David D. Cox, Ruchir Puri, Rameswar Panda
cs.AI

摘要

本文介紹了長上下文 Granite 程式碼模型,支援高達 128K 個標記的有效上下文窗口。我們對 Granite 3B/8B 程式碼模型的上下文長度進行擴展,從 2K/4K 擴展到 128K 的解決方案包括輕量級持續預訓練,逐步增加其 RoPE 基頻率,並使用存儲庫級檔案打包和長上下文數據進行長度上採樣。此外,我們還釋出了支援長上下文的指令調整模型,通過進一步在許可權寬鬆的短和長上下文指令-回應對上對長上下文基礎模型進行微調而得到。與原始短上下文 Granite 程式碼模型相比,我們的長上下文模型在長上下文任務上取得了顯著改進,而在常規程式碼完成基準測試(例如 HumanEval)上並未觀察到性能下降。我們釋出所有長上下文 Granite 程式碼模型,採用 Apache 2.0 許可證,供研究和商業用途使用。
English
This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraining by gradually increasing its RoPE base frequency with repository-level file packing and length-upsampled long-context data. Additionally, we also release instruction-tuned models with long-context support which are derived by further finetuning the long context base models on a mix of permissively licensed short and long-context instruction-response pairs. While comparing to the original short-context Granite code models, our long-context models achieve significant improvements on long-context tasks without any noticeable performance degradation on regular code completion benchmarks (e.g., HumanEval). We release all our long-context Granite code models under an Apache 2.0 license for both research and commercial use.

Summary

AI-Generated Summary

PDF203November 28, 2024