저자들은 저자원 언어에 대한 코드 생성을 향상시키는 것에 대해 다루었으며, 이에 대한 마법같은 해결책은 없다.

초록

대형 언어 모델(LLM)의 등장은 자동 코드 생성 분야를 크게 발전시켰습니다. LLM은 프로그래밍 언어의 구문, 의미 및 사용 패턴을 학습하기 위해 대규모이고 다양한 데이터셋에 의존합니다. 저자원 언어(즉, 훈련 데이터가 부족한 특정 프로그래밍 언어를 가리키는)의 경우, 이러한 데이터의 제한된 가용성으로 인해 모델이 효과적으로 일반화하는 데 어려움을 겪어 성능이 떨어지는 경우가 많습니다. 이러한 이유로 이러한 성능 차이를 줄일 수 있는 기술에 대한 탐구가 있습니다. 저자원 언어에서 LLM의 성능을 향상시키는 여러 접근 방식의 효과를 조사한 경험적 연구를 제시합니다. 이 방식은 다음과 같습니다: (i) 훈련 데이터의 부족으로 크기가 제한된 고전적인 미세 조정; (ii) 저자원 언어에 대한 추가 정보를 제공하기 위해 설계된 프롬프트를 사용하는 인컨텍스트 학습의 세 가지 변형(예: 대상 언어의 기능을 보여주는 퓨샷 예제); 그리고 (iii) 고-저자원 언어 간 번역 방법을 모델에 가르치는 사전 훈련 목표. 우리 연구의 맥락은 두 가지 저자원 언어(R 및 Racket)와 다양한 아키텍처와 크기를 가진 여섯 개의 LLM입니다. 결과는 작은 LLM의 경우 미세 조정이 일반적으로 최선의 선택인 것으로 나타났습니다. 아마도 작은 데이터셋이 제한된 매개변수를 훈련하는 데 충분하기 때문일 것입니다. 모델의 크기가 커질수록 인컨텍스트 학습이 더욱 효과적이며 안전하고 경제적인 선택이 됩니다(즉, 항상 도움이 되지만 다양한 정도로). 그러나 매우 큰 LLM은 미세 조정을 수행할 때 저자원 언어에서 성능이 저하될 수 있습니다. 아마도 가중치를 효과적으로 업데이트하기에 충분한 데이터가 부족하기 때문일 것입니다.

English

The advent of Large Language Models (LLMs) has significantly advanced the field of automated code generation. LLMs rely on large and diverse datasets to learn syntax, semantics, and usage patterns of programming languages. For low-resource languages (i.e., niche programming languages characterized by the scarcity of training data), the limited availability of such data hampers the models' ability to generalize effectively, resulting in poorer code generation performance as compared to high-resource languages. For this reason, there is a quest for techniques able to close this performance gap. We present an empirical study investigating the effectiveness of several approaches for boosting LLMs' performance on low-resource languages, namely: (i) a classic fine-tuning, which is however capped in size by the scarcity of training data; (ii) three variants of in-context learning, with prompts crafted to provide the LLM with additional information about the low-resource language (e.g., few-shot examples showcasing features of the targeted language); and (iii) a pre-training objective teaching the model how to translate between high- and low-resource languages. The context of our study are two low-resource languages (R and Racket) and six LLMs having different architectures and sizes. Our findings reveal that a fine-tuning is usually the best choice for smaller LLMs, possibly due to the fact that even a small dataset is sufficient to train their limited number of parameters. With the increase in size of the models, in-context learning becomes more and more effective, representing a safe and cheap bet (i.e., it always helps, but with different magnitudes). Differently, very large LLMs may deteriorate their performance on low-resource languages when fine-tuning is performed, possibly due to the lack of enough data needed to effectively update their weights.

저자들은 저자원 언어에 대한 코드 생성을 향상시키는 것에 대해 다루었으며, 이에 대한 마법같은 해결책은 없다.

Enhancing Code Generation for Low-Resource Languages: No Silver Bullet

초록

Support