OctoPack: コード大規模言語モデルの指示チューニング

要旨

大規模言語モデル（LLM）を指示に基づいてファインチューニングすることで、自然言語タスクにおける性能が大幅に向上します。本研究では、コードを用いた指示チューニングを適用し、コード変更と人間の指示をペアリングするGitコミットの自然な構造を活用します。私たちはCommitPackを構築しました。これは350のプログラミング言語にわたる4テラバイトのGitコミットから成ります。CommitPackを他の自然および合成コード指示（xP3x、Self-Instruct、OASST）と比較し、16BパラメータのStarCoderモデルでベンチマークを行い、HumanEval PythonベンチマークにおいてOpenAIの出力で訓練されていないモデルの中で最高の性能（46.2% pass@1）を達成しました。さらに、HumanEvalPackを導入し、HumanEvalベンチマークを6言語（Python、JavaScript、Java、Go、C++、Rust）にわたる3つのコーディングタスク（コード修復、コード説明、コード合成）に拡張しました。私たちのモデル、OctoCoderとOctoGeeXは、HumanEvalPack全体で最も優れた性能を達成し、CommitPackがより広範な言語と自然なコーディングタスクに一般化する利点を実証しました。コード、モデル、データはhttps://github.com/bigcode-project/octopackで自由に利用可能です。

English

Finetuning large language models (LLMs) on instructions leads to vast performance improvements on natural language tasks. We apply instruction tuning using code, leveraging the natural structure of Git commits, which pair code changes with human instructions. We compile CommitPack: 4 terabytes of Git commits across 350 programming languages. We benchmark CommitPack against other natural and synthetic code instructions (xP3x, Self-Instruct, OASST) on the 16B parameter StarCoder model, and achieve state-of-the-art performance among models not trained on OpenAI outputs, on the HumanEval Python benchmark (46.2% pass@1). We further introduce HumanEvalPack, expanding the HumanEval benchmark to a total of 3 coding tasks (Code Repair, Code Explanation, Code Synthesis) across 6 languages (Python, JavaScript, Java, Go, C++, Rust). Our models, OctoCoder and OctoGeeX, achieve the best performance across HumanEvalPack among all permissive models, demonstrating CommitPack's benefits in generalizing to a wider set of languages and natural coding tasks. Code, models and data are freely available at https://github.com/bigcode-project/octopack.

OctoPack: コード大規模言語モデルの指示チューニング

OctoPack: Instruction Tuning Code Large Language Models

要旨

Support