SelfCodeAlign: コード生成のための自己アラインメント

要旨

インストラクションチューニングは、大規模言語モデル（LLMs）が人間の指示に従う能力を大幅に向上させる教師ありファインチューニング手法です。私たちは、SelfCodeAlignを提案します。これは、従来の人間アノテーションや蒸留が必要なく、完全に透明で許容性のあるパイプラインであり、コードLLMsを自己整列させるものです。SelfCodeAlignは、データ生成プロセス全体で同じベースモデルを推論に使用します。まず、高品質なシードスニペットから多様なコーディングコンセプトを抽出して新しいタスクを生成します。次に、各タスクに複数の応答をサンプリングし、それぞれをテストケースとペアにしてサンドボックス環境で検証します。最後に、合格した例がインストラクションチューニングのために選択されます。主要な実験では、CodeQwen1.5-7Bを使用してSelfCodeAlignを使用し、74kのインストラクション-応答ペアのデータセットを生成します。このデータセットでのファインチューニングにより、HumanEval+で67.1 pass@1を達成するモデルが得られ、CodeLlama-70B-Instructを10倍小さくしても上回ります。すべてのベンチマークで、このファインチューニングされたモデルは、以前の最先端手法であるOctoPackで訓練された元のバージョンよりも優れたパフォーマンスを維持します。さらに、SelfCodeAlignが、3Bから33BまでのさまざまなサイズのLLMsに効果的であり、ベースモデルが自身のデータ分布との整合性からより多くの利益を得られることを示します。また、SelfCodeAlignの各コンポーネントの効果を検証し、GPT-4oからの直接蒸留やOSS-Instruct、Evol-Instructなどの主要なGPT-3.5ベースの蒸留手法を上回ることを示します。SelfCodeAlignは、最先端のコーディングパフォーマンスを達成する完全透明で許容性のある自己整列コードLLMであるStarCoder2-Instructの作成にもつながりました。

English

Instruction tuning is a supervised fine-tuning approach that significantly improves the ability of large language models (LLMs) to follow human instructions. We propose SelfCodeAlign, the first fully transparent and permissive pipeline for self-aligning code LLMs without extensive human annotations or distillation. SelfCodeAlign employs the same base model for inference throughout the data generation process. It first extracts diverse coding concepts from high-quality seed snippets to generate new tasks. It then samples multiple responses per task, pairs each with test cases, and validates them in a sandbox environment. Finally, passing examples are selected for instruction tuning. In our primary experiments, we use SelfCodeAlign with CodeQwen1.5-7B to generate a dataset of 74k instruction-response pairs. Finetuning on this dataset leads to a model that achieves a 67.1 pass@1 on HumanEval+, surpassing CodeLlama-70B-Instruct despite being ten times smaller. Across all benchmarks, this finetuned model consistently outperforms the original version trained with OctoPack, the previous state-of-the-art method for instruction tuning without human annotations or distillation. Additionally, we show that SelfCodeAlign is effective across LLMs of various sizes, from 3B to 33B, and that the base models can benefit more from alignment with their own data distribution. We further validate each component's effectiveness in our pipeline, showing that SelfCodeAlign outperforms both direct distillation from GPT-4o and leading GPT-3.5-based distillation methods, such as OSS-Instruct and Evol-Instruct. SelfCodeAlign has also led to the creation of StarCoder2-Instruct, the first fully transparent, permissively licensed, and self-aligned code LLM that achieves state-of-the-art coding performance.

SelfCodeAlign: コード生成のための自己アラインメント

SelfCodeAlign: Self-Alignment for Code Generation

要旨

Support