為何個人化基於深度學習的程式碼補全工具至關重要

摘要

基於深度學習（DL）的代碼補全工具已通過實現高級代碼生成，徹底改變了軟件開發。這些工具利用在大量代碼庫上訓練的模型，捕捉通用的編碼模式。然而，針對特定組織或開發者進行微調以提升其在這些主體上的性能的影響，仍未被探索。在本研究中，我們通過提供堅實的實證證據來填補這一空白，具體回答了這一問題。更詳細地說，我們考慮了來自兩個組織（Apache 和 Spring）的 136 名開發者、兩種模型架構（T5 和 Code Llama）以及三種模型規模（6000 萬、7.5 億和 70 億可訓練參數）。T5 模型（6000 萬、7.5 億）在超過 2000 個開源項目上進行了預訓練和微調，排除了主體組織的數據，並與在組織和開發者特定數據集上微調的版本進行了比較。對於 Code Llama 模型（70 億），我們比較了在線公開的已預訓練模型與通過參數高效微調在組織和開發者特定數據集上微調的同一模型的性能。我們的結果表明，無論是組織特定還是開發者特定的額外微調，都能提升預測能力，其中前者表現尤為突出。這一發現普遍適用於（i）兩個主體組織（即 Apache 和 Spring）以及（ii）規模完全不同的模型（從 6000 萬到 70 億可訓練參數）。最後，我們展示了在組織特定數據集上微調的 DL 模型，能夠達到與未經微調的預訓練代碼模型相同的補全性能，而後者的規模是前者的 10 倍，從而節省了部署和推理成本（例如，所需 GPU 更小）。

English

Deep learning (DL)-based code completion tools have transformed software development by enabling advanced code generation. These tools leverage models trained on vast amounts of code from numerous repositories, capturing general coding patterns. However, the impact of fine-tuning these models for specific organizations or developers to boost their performance on such subjects remains unexplored. In this work, we fill this gap by presenting solid empirical evidence answering this question. More specifically, we consider 136 developers from two organizations (Apache and Spring), two model architectures (T5 and Code Llama), and three model sizes (60M, 750M, and 7B trainable parameters). T5 models (60M, 750M) were pre-trained and fine-tuned on over 2,000 open-source projects, excluding the subject organizations' data, and compared against versions fine-tuned on organization- and developer-specific datasets. For the Code Llama model (7B), we compared the performance of the already pre-trained model publicly available online with the same model fine-tuned via parameter-efficient fine-tuning on organization- and developer-specific datasets. Our results show that there is a boost in prediction capabilities provided by both an organization-specific and a developer-specific additional fine-tuning, with the former being particularly performant. Such a finding generalizes across (i) the two subject organizations (i.e., Apache and Spring) and (ii) models of completely different magnitude (from 60M to 7B trainable parameters). Finally, we show that DL models fine-tuned on an organization-specific dataset achieve the same completion performance of pre-trained code models used out of the box and being sim10times larger, with consequent savings in terms of deployment and inference cost (e.g., smaller GPUs needed).

為何個人化基於深度學習的程式碼補全工具至關重要

Why Personalizing Deep Learning-Based Code Completion Tools Matters

摘要

Support