增強低資源語言的代碼生成：沒有銀彈

摘要

大型語言模型（LLMs）的出現顯著推動了自動代碼生成領域的發展。LLMs依賴於龐大且多樣的數據集來學習編程語言的語法、語義和使用模式。對於資源有限的語言（即特點是訓練數據稀缺的專業編程語言），這些數據的有限可用性阻礙了模型有效泛化的能力，導致代碼生成性能較高資源語言差。因此，人們正在尋求能夠彌補這種性能差距的技術。我們提出了一項實證研究，探討了幾種提升LLMs在資源有限語言上性能的方法，包括：（i）經典微調，但由於訓練數據稀缺，其規模受限；（ii）三種上下文學習的變體，通過設計提示來為LLM提供有關資源有限語言的額外信息（例如展示目標語言特徵的少量示例）；以及（iii）一種預訓練目標，教導模型如何在高資源和低資源語言之間進行翻譯。我們研究的背景是兩種資源有限語言（R和Racket）以及六種具有不同架構和大小的LLMs。我們的研究結果顯示，對於較小的LLMs，微調通常是最佳選擇，可能是因為即使一個小數據集也足以訓練其有限數量的參數。隨著模型大小的增加，上下文學習變得越來越有效，代表一個安全且便宜的選擇（即它總是有所幫助，但幫助程度不同）。相反，當進行微調時，非常大的LLMs可能會在資源有限語言上降低性能，可能是因為缺乏足夠的數據來有效更新其權重。

English

The advent of Large Language Models (LLMs) has significantly advanced the field of automated code generation. LLMs rely on large and diverse datasets to learn syntax, semantics, and usage patterns of programming languages. For low-resource languages (i.e., niche programming languages characterized by the scarcity of training data), the limited availability of such data hampers the models' ability to generalize effectively, resulting in poorer code generation performance as compared to high-resource languages. For this reason, there is a quest for techniques able to close this performance gap. We present an empirical study investigating the effectiveness of several approaches for boosting LLMs' performance on low-resource languages, namely: (i) a classic fine-tuning, which is however capped in size by the scarcity of training data; (ii) three variants of in-context learning, with prompts crafted to provide the LLM with additional information about the low-resource language (e.g., few-shot examples showcasing features of the targeted language); and (iii) a pre-training objective teaching the model how to translate between high- and low-resource languages. The context of our study are two low-resource languages (R and Racket) and six LLMs having different architectures and sizes. Our findings reveal that a fine-tuning is usually the best choice for smaller LLMs, possibly due to the fact that even a small dataset is sufficient to train their limited number of parameters. With the increase in size of the models, in-context learning becomes more and more effective, representing a safe and cheap bet (i.e., it always helps, but with different magnitudes). Differently, very large LLMs may deteriorate their performance on low-resource languages when fine-tuning is performed, possibly due to the lack of enough data needed to effectively update their weights.

增強低資源語言的代碼生成：沒有銀彈

Enhancing Code Generation for Low-Resource Languages: No Silver Bullet

摘要

Support