ChatPaper.aiChatPaper

OctoPack:指导调整代码大型语言模型

OctoPack: Instruction Tuning Code Large Language Models

August 14, 2023
作者: Niklas Muennighoff, Qian Liu, Armel Zebaze, Qinkai Zheng, Binyuan Hui, Terry Yue Zhuo, Swayam Singh, Xiangru Tang, Leandro von Werra, Shayne Longpre
cs.AI

摘要

在指令上对大型语言模型(LLMs)进行微调可显著提升自然语言任务的性能。我们应用代码进行指令微调,利用Git提交的自然结构,将代码更改与人类指令配对。我们编制了CommitPack:跨350种编程语言的4TB Git提交。我们在拥有16B参数的StarCoder模型上对比CommitPack与其他自然和合成代码指令(xP3x、Self-Instruct、OASST),在HumanEval Python基准测试中取得了最先进的性能,通过在未经OpenAI输出训练的模型中达到46.2%的一次通过率。我们进一步引入HumanEvalPack,将HumanEval基准测试扩展到总共3个编码任务(代码修复、代码解释、代码合成)涵盖6种语言(Python、JavaScript、Java、Go、C++、Rust)。我们的模型OctoCoder和OctoGeeX在HumanEvalPack中在所有宽松模型中取得最佳性能,展示了CommitPack在泛化到更广泛语言和自然编码任务方面的优势。代码、模型和数据可在https://github.com/bigcode-project/octopack 免费获取。
English
Finetuning large language models (LLMs) on instructions leads to vast performance improvements on natural language tasks. We apply instruction tuning using code, leveraging the natural structure of Git commits, which pair code changes with human instructions. We compile CommitPack: 4 terabytes of Git commits across 350 programming languages. We benchmark CommitPack against other natural and synthetic code instructions (xP3x, Self-Instruct, OASST) on the 16B parameter StarCoder model, and achieve state-of-the-art performance among models not trained on OpenAI outputs, on the HumanEval Python benchmark (46.2% pass@1). We further introduce HumanEvalPack, expanding the HumanEval benchmark to a total of 3 coding tasks (Code Repair, Code Explanation, Code Synthesis) across 6 languages (Python, JavaScript, Java, Go, C++, Rust). Our models, OctoCoder and OctoGeeX, achieve the best performance across HumanEvalPack among all permissive models, demonstrating CommitPack's benefits in generalizing to a wider set of languages and natural coding tasks. Code, models and data are freely available at https://github.com/bigcode-project/octopack.
PDF301December 15, 2024