為何個人化基於深度學習的程式碼補全工具至關重要
Why Personalizing Deep Learning-Based Code Completion Tools Matters
March 18, 2025
作者: Alessandro Giagnorio, Alberto Martin-Lopez, Gabriele Bavota
cs.AI
摘要
基於深度學習(DL)的代碼補全工具已通過實現高級代碼生成,徹底改變了軟件開發。這些工具利用在大量代碼庫上訓練的模型,捕捉通用的編碼模式。然而,針對特定組織或開發者進行微調以提升其在這些主體上的性能的影響,仍未被探索。在本研究中,我們通過提供堅實的實證證據來填補這一空白,具體回答了這一問題。更詳細地說,我們考慮了來自兩個組織(Apache 和 Spring)的 136 名開發者、兩種模型架構(T5 和 Code Llama)以及三種模型規模(6000 萬、7.5 億和 70 億可訓練參數)。T5 模型(6000 萬、7.5 億)在超過 2000 個開源項目上進行了預訓練和微調,排除了主體組織的數據,並與在組織和開發者特定數據集上微調的版本進行了比較。對於 Code Llama 模型(70 億),我們比較了在線公開的已預訓練模型與通過參數高效微調在組織和開發者特定數據集上微調的同一模型的性能。我們的結果表明,無論是組織特定還是開發者特定的額外微調,都能提升預測能力,其中前者表現尤為突出。這一發現普遍適用於(i)兩個主體組織(即 Apache 和 Spring)以及(ii)規模完全不同的模型(從 6000 萬到 70 億可訓練參數)。最後,我們展示了在組織特定數據集上微調的 DL 模型,能夠達到與未經微調的預訓練代碼模型相同的補全性能,而後者的規模是前者的 10 倍,從而節省了部署和推理成本(例如,所需 GPU 更小)。
English
Deep learning (DL)-based code completion tools have transformed software
development by enabling advanced code generation. These tools leverage models
trained on vast amounts of code from numerous repositories, capturing general
coding patterns. However, the impact of fine-tuning these models for specific
organizations or developers to boost their performance on such subjects remains
unexplored. In this work, we fill this gap by presenting solid empirical
evidence answering this question. More specifically, we consider 136 developers
from two organizations (Apache and Spring), two model architectures (T5 and
Code Llama), and three model sizes (60M, 750M, and 7B trainable parameters). T5
models (60M, 750M) were pre-trained and fine-tuned on over 2,000 open-source
projects, excluding the subject organizations' data, and compared against
versions fine-tuned on organization- and developer-specific datasets. For the
Code Llama model (7B), we compared the performance of the already pre-trained
model publicly available online with the same model fine-tuned via
parameter-efficient fine-tuning on organization- and developer-specific
datasets. Our results show that there is a boost in prediction capabilities
provided by both an organization-specific and a developer-specific additional
fine-tuning, with the former being particularly performant. Such a finding
generalizes across (i) the two subject organizations (i.e., Apache and Spring)
and (ii) models of completely different magnitude (from 60M to 7B trainable
parameters). Finally, we show that DL models fine-tuned on an
organization-specific dataset achieve the same completion performance of
pre-trained code models used out of the box and being sim10times larger,
with consequent savings in terms of deployment and inference cost (e.g.,
smaller GPUs needed).Summary
AI-Generated Summary