왜 딥러닝 기반 코드 완성 도구의 개인화가 중요한가

초록

딥러닝(DL) 기반 코드 완성 도구는 고급 코드 생성을 가능하게 함으로써 소프트웨어 개발을 혁신적으로 변화시켰습니다. 이러한 도구들은 수많은 저장소에서 수집된 방대한 양의 코드를 학습한 모델을 활용하여 일반적인 코딩 패턴을 포착합니다. 그러나 특정 조직이나 개발자에 맞춰 이러한 모델을 미세 조정(fine-tuning)하여 해당 주제에서의 성능을 향상시키는 것의 영향력은 아직 탐구되지 않았습니다. 본 연구에서는 이 질문에 대한 확실한 실증적 증거를 제시함으로써 이러한 공백을 메웁니다. 보다 구체적으로, 우리는 두 조직(Apache와 Spring)의 136명 개발자, 두 가지 모델 아키텍처(T5와 Code Llama), 그리고 세 가지 모델 크기(6천만, 7억 5천만, 70억 개의 학습 가능한 매개변수)를 고려했습니다. T5 모델(6천만, 7억 5천만)은 대상 조직의 데이터를 제외한 2,000개 이상의 오픈소스 프로젝트에서 사전 학습 및 미세 조정되었으며, 조직 및 개발자별 데이터셋으로 미세 조정된 버전과 비교되었습니다. Code Llama 모델(70억)의 경우, 온라인에서 공개적으로 제공되는 사전 학습된 모델의 성능을 조직 및 개발자별 데이터셋으로 파라미터 효율적 미세 조정(parameter-efficient fine-tuning)을 통해 미세 조정된 동일 모델과 비교했습니다. 우리의 결과는 조직별 및 개발자별 추가 미세 조정이 예측 능력을 향상시킨다는 것을 보여주며, 특히 조직별 미세 조정이 더 뛰어난 성능을 보였습니다. 이러한 발견은 (i) 두 대상 조직(Apache와 Spring)과 (ii) 완전히 다른 규모의 모델(6천만에서 70억 개의 학습 가능한 매개변수)에 걸쳐 일반화됩니다. 마지막으로, 조직별 데이터셋으로 미세 조정된 DL 모델이 즉시 사용 가능한 사전 학습된 코드 모델과 동일한 완성 성능을 달성하면서도 모델 크기가 약 10배 더 작아 배포 및 추론 비용(예: 더 작은 GPU 필요) 측면에서 상당한 절감 효과를 보인다는 것을 입증했습니다.

English

Deep learning (DL)-based code completion tools have transformed software development by enabling advanced code generation. These tools leverage models trained on vast amounts of code from numerous repositories, capturing general coding patterns. However, the impact of fine-tuning these models for specific organizations or developers to boost their performance on such subjects remains unexplored. In this work, we fill this gap by presenting solid empirical evidence answering this question. More specifically, we consider 136 developers from two organizations (Apache and Spring), two model architectures (T5 and Code Llama), and three model sizes (60M, 750M, and 7B trainable parameters). T5 models (60M, 750M) were pre-trained and fine-tuned on over 2,000 open-source projects, excluding the subject organizations' data, and compared against versions fine-tuned on organization- and developer-specific datasets. For the Code Llama model (7B), we compared the performance of the already pre-trained model publicly available online with the same model fine-tuned via parameter-efficient fine-tuning on organization- and developer-specific datasets. Our results show that there is a boost in prediction capabilities provided by both an organization-specific and a developer-specific additional fine-tuning, with the former being particularly performant. Such a finding generalizes across (i) the two subject organizations (i.e., Apache and Spring) and (ii) models of completely different magnitude (from 60M to 7B trainable parameters). Finally, we show that DL models fine-tuned on an organization-specific dataset achieve the same completion performance of pre-trained code models used out of the box and being sim10times larger, with consequent savings in terms of deployment and inference cost (e.g., smaller GPUs needed).

왜 딥러닝 기반 코드 완성 도구의 개인화가 중요한가

Why Personalizing Deep Learning-Based Code Completion Tools Matters

초록

Support