なぜディープラーニングベースのコード補完ツールのパーソナライゼーションが重要なのか

要旨

深層学習（DL）ベースのコード補完ツールは、高度なコード生成を可能にすることでソフトウェア開発を変革してきました。これらのツールは、多数のリポジトリから収集された膨大な量のコードで訓練されたモデルを活用し、一般的なコーディングパターンを捕捉します。しかし、特定の組織や開発者向けにこれらのモデルをファインチューニングすることで、そのような対象でのパフォーマンスを向上させる影響については未解明のままでした。本研究では、このギャップを埋めるため、この疑問に答える確固たる実証的証拠を提示します。具体的には、2つの組織（ApacheとSpring）から136人の開発者、2つのモデルアーキテクチャ（T5とCode Llama）、および3つのモデルサイズ（6000万、7億5000万、70億の学習可能パラメータ）を検討しました。T5モデル（6000万、7億5000万）は、対象組織のデータを除いた2000以上のオープンソースプロジェクトで事前訓練およびファインチューニングされ、組織固有および開発者固有のデータセットでファインチューニングされたバージョンと比較されました。Code Llamaモデル（70億）については、オンラインで公開されている事前訓練済みモデルと、組織固有および開発者固有のデータセットでパラメータ効率的なファインチューニングを施した同じモデルのパフォーマンスを比較しました。結果は、組織固有および開発者固有の追加ファインチューニングによって予測能力が向上し、特に前者が高いパフォーマンスを示すことを明らかにしました。この発見は、(i) 2つの対象組織（ApacheとSpring）および(ii) 全く異なる規模のモデル（6000万から70億の学習可能パラメータ）にわたって一般化されます。最後に、組織固有のデータセットでファインチューニングされたDLモデルが、そのまま使用される事前訓練済みコードモデルと同等の補完性能を達成し、その10分の1のサイズで済むことを示し、展開と推論コストの節約（例：より小さなGPUの必要性）をもたらすことを示しました。

English

Deep learning (DL)-based code completion tools have transformed software development by enabling advanced code generation. These tools leverage models trained on vast amounts of code from numerous repositories, capturing general coding patterns. However, the impact of fine-tuning these models for specific organizations or developers to boost their performance on such subjects remains unexplored. In this work, we fill this gap by presenting solid empirical evidence answering this question. More specifically, we consider 136 developers from two organizations (Apache and Spring), two model architectures (T5 and Code Llama), and three model sizes (60M, 750M, and 7B trainable parameters). T5 models (60M, 750M) were pre-trained and fine-tuned on over 2,000 open-source projects, excluding the subject organizations' data, and compared against versions fine-tuned on organization- and developer-specific datasets. For the Code Llama model (7B), we compared the performance of the already pre-trained model publicly available online with the same model fine-tuned via parameter-efficient fine-tuning on organization- and developer-specific datasets. Our results show that there is a boost in prediction capabilities provided by both an organization-specific and a developer-specific additional fine-tuning, with the former being particularly performant. Such a finding generalizes across (i) the two subject organizations (i.e., Apache and Spring) and (ii) models of completely different magnitude (from 60M to 7B trainable parameters). Finally, we show that DL models fine-tuned on an organization-specific dataset achieve the same completion performance of pre-trained code models used out of the box and being sim10times larger, with consequent savings in terms of deployment and inference cost (e.g., smaller GPUs needed).

なぜディープラーニングベースのコード補完ツールのパーソナライゼーションが重要なのか

Why Personalizing Deep Learning-Based Code Completion Tools Matters

要旨

Support