OctoPack：指導調整代碼大型語言模型

摘要

在指令上微調大型語言模型（LLMs）可以極大地提升自然語言任務的表現。我們應用代碼進行指令微調，利用 Git 提交的自然結構，將代碼更改與人類指令配對。我們編制了 CommitPack：跨 350 種編程語言的 4TB Git 提交。我們在 16B 參數的 StarCoder 模型上對比 CommitPack 與其他自然和合成代碼指令（xP3x、Self-Instruct、OASST），在未經 OpenAI 輸出訓練的模型中，在 HumanEval Python 基準測試中取得了最先進的表現（46.2% pass@1）。我們進一步引入 HumanEvalPack，將 HumanEval 基準擴展到總共 3 項編碼任務（代碼修復、代碼解釋、代碼合成），跨 6 種語言（Python、JavaScript、Java、Go、C++、Rust）。我們的模型 OctoCoder 和 OctoGeeX 在 HumanEvalPack 中在所有寬鬆模型中取得最佳表現，展示了 CommitPack 在泛化到更廣泛語言和自然編碼任務方面的優勢。代碼、模型和數據可在 https://github.com/bigcode-project/octopack 免費獲得。

English

Finetuning large language models (LLMs) on instructions leads to vast performance improvements on natural language tasks. We apply instruction tuning using code, leveraging the natural structure of Git commits, which pair code changes with human instructions. We compile CommitPack: 4 terabytes of Git commits across 350 programming languages. We benchmark CommitPack against other natural and synthetic code instructions (xP3x, Self-Instruct, OASST) on the 16B parameter StarCoder model, and achieve state-of-the-art performance among models not trained on OpenAI outputs, on the HumanEval Python benchmark (46.2% pass@1). We further introduce HumanEvalPack, expanding the HumanEval benchmark to a total of 3 coding tasks (Code Repair, Code Explanation, Code Synthesis) across 6 languages (Python, JavaScript, Java, Go, C++, Rust). Our models, OctoCoder and OctoGeeX, achieve the best performance across HumanEvalPack among all permissive models, demonstrating CommitPack's benefits in generalizing to a wider set of languages and natural coding tasks. Code, models and data are freely available at https://github.com/bigcode-project/octopack.

OctoPack：指導調整代碼大型語言模型

OctoPack: Instruction Tuning Code Large Language Models

摘要

Support