WizardCoder：利用Evol-Instruct技術賦能程式碼大型語言模型

摘要

大型語言模型編碼器（Code LLMs），如StarCoder，在與代碼相關的任務中展現出卓越的性能。然而，大多數現有模型僅在廣泛的原始代碼數據上進行預訓練，沒有進行指導微調。本文介紹了WizardCoder，它通過將Evol-Instruct方法應用於代碼領域，為Code LLMs提供了複雜的指導微調能力。通過對四個著名的代碼生成基準進行全面實驗，即HumanEval、HumanEval+、MBPP和DS-1000，我們展示了我們模型的卓越能力。它在所有其他開源Code LLMs上取得了顯著的優勢。此外，我們的模型甚至在HumanEval和HumanEval+上超越了最大的封閉LLMs，Anthropic的Claude和Google的Bard。我們的代碼、模型權重和數據可在https://github.com/nlpxucan/WizardLM 上公開獲取。

English

Code Large Language Models (Code LLMs), such as StarCoder, have demonstrated exceptional performance in code-related tasks. However, most existing models are solely pre-trained on extensive raw code data without instruction fine-tuning. In this paper, we introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code. Through comprehensive experiments on four prominent code generation benchmarks, namely HumanEval, HumanEval+, MBPP, and DS-1000, we unveil the exceptional capabilities of our model. It surpasses all other open-source Code LLMs by a substantial margin. Moreover, our model even outperforms the largest closed LLMs, Anthropic's Claude and Google's Bard, on HumanEval and HumanEval+. Our code, model weights, and data are public at https://github.com/nlpxucan/WizardLM

WizardCoder：利用Evol-Instruct技術賦能程式碼大型語言模型

WizardCoder: Empowering Code Large Language Models with Evol-Instruct

摘要

Support