ChatPaper.aiChatPaper

大型語言模型的參數高效微調以進行單元測試生成:一項實證研究

Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study

November 4, 2024
作者: André Storhaug, Jingyue Li
cs.AI

摘要

大型語言模型(LLMs)如GitHub Copilot的出現顯著提升了程式設計人員的生產力,特別是在程式碼生成方面。然而,這些模型在沒有進行微調的情況下通常難以應對現實世界的任務。隨著LLMs變得更大且性能更好,針對專業任務的微調變得越來越昂貴。參數高效微調(PEFT)方法僅微調模型參數的子集,提供了一種有前途的解決方案,可以降低調整LLMs的計算成本,同時保持其性能。現有研究已探索了在各種與程式碼相關的任務中使用PEFT和LLMs,並發現PEFT技術的有效性取決於任務。在單元測試生成中應用PEFT技術仍未被充分探索。目前最先進的方法僅使用完全微調的LLMs來生成單元測試。本文研究了完全微調和各種PEFT方法,包括LoRA、(IA)^3和提示微調,在不同的模型架構和尺寸上。我們使用成熟的基準數據集來評估它們在單元測試生成中的有效性。我們的研究結果表明,PEFT方法可以提供與完全微調相當的性能,使專業微調更具可行性和成本效益。值得注意的是,就成本和資源利用而言,提示微調是最有效的,而LoRA在幾種情況下接近完全微調的效果。
English
The advent of large language models (LLMs) like GitHub Copilot has significantly enhanced programmers' productivity, particularly in code generation. However, these models often struggle with real-world tasks without fine-tuning. As LLMs grow larger and more performant, fine-tuning for specialized tasks becomes increasingly expensive. Parameter-efficient fine-tuning (PEFT) methods, which fine-tune only a subset of model parameters, offer a promising solution by reducing the computational costs of tuning LLMs while maintaining their performance. Existing studies have explored using PEFT and LLMs for various code-related tasks and found that the effectiveness of PEFT techniques is task-dependent. The application of PEFT techniques in unit test generation remains underexplored. The state-of-the-art is limited to using LLMs with full fine-tuning to generate unit tests. This paper investigates both full fine-tuning and various PEFT methods, including LoRA, (IA)^3, and prompt tuning, across different model architectures and sizes. We use well-established benchmark datasets to evaluate their effectiveness in unit test generation. Our findings show that PEFT methods can deliver performance comparable to full fine-tuning for unit test generation, making specialized fine-tuning more accessible and cost-effective. Notably, prompt tuning is the most effective in terms of cost and resource utilization, while LoRA approaches the effectiveness of full fine-tuning in several cases.

Summary

AI-Generated Summary

PDF103November 14, 2024