MFTCoder：利用多任務微調提升程式碼LLM

摘要

代碼語言模型已成為一個專業的研究領域，致力於通過對預訓練模型進行微調來增強模型的編碼能力，並有出色的研究專注於此。先前的微調方法通常針對特定的下游任務或情境進行定制，這意味著每個任務需要單獨進行微調，需要大量的訓練資源，並在部署和維護方面存在挑戰。此外，這些方法未能充分利用不同與代碼相關任務之間的內在聯繫。為了克服這些限制，我們提出了一個多任務微調框架，MFTcoder，可以在多個任務上實現同時且並行的微調。通過結合各種損失函數，我們有效地應對多任務學習中的常見挑戰，如數據不平衡、難度不同和收斂速度不一致。大量實驗已明確證明，我們的多任務微調方法優於單個任務的個別微調以及混合任務的微調。此外，MFTcoder提供了高效的訓練能力，包括高效的數據標記模式和PEFT微調，相較於傳統的微調方法，速度顯著提高。MFTcoder與幾個主流的開源代碼語言模型無縫集成，如CodeLLama和Qwen。利用CodeLLama基礎，我們的MFTcoder微調模型，CodeFuse-CodeLLama-34B，在HumaneEval基準測試中取得了令人印象深刻的74.4\% pass@1分數，超越了GPT-4的性能（67\%，零-shot）。MFTCoder在https://github.com/codefuse-ai/MFTCOder上開源。

English

Code LLMs have emerged as a specialized research field, with remarkable studies dedicated to enhancing model's coding capabilities through fine-tuning on pre-trained models. Previous fine-tuning approaches were typically tailored to specific downstream tasks or scenarios, which meant separate fine-tuning for each task, requiring extensive training resources and posing challenges in terms of deployment and maintenance. Furthermore, these approaches failed to leverage the inherent interconnectedness among different code-related tasks. To overcome these limitations, we present a multi-task fine-tuning framework, MFTcoder, that enables simultaneous and parallel fine-tuning on multiple tasks. By incorporating various loss functions, we effectively address common challenges in multi-task learning, such as data imbalance, varying difficulty levels, and inconsistent convergence speeds. Extensive experiments have conclusively demonstrated that our multi-task fine-tuning approach outperforms both individual fine-tuning on single tasks and fine-tuning on a mixed ensemble of tasks. Moreover, MFTcoder offers efficient training capabilities, including efficient data tokenization modes and PEFT fine-tuning, resulting in significantly improved speed compared to traditional fine-tuning methods. MFTcoder seamlessly integrates with several mainstream open-source LLMs, such as CodeLLama and Qwen. Leveraging the CodeLLama foundation, our MFTcoder fine-tuned model, CodeFuse-CodeLLama-34B, achieves an impressive pass@1 score of 74.4\% on the HumaneEval benchmark, surpassing GPT-4 performance (67\%, zero-shot). MFTCoder is open-sourced at https://github.com/codefuse-ai/MFTCOder

MFTCoder：利用多任務微調提升程式碼LLM

MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning

摘要

Support