LLMs 能否解決以範例編程的問題？

摘要

通過示例編程（PBE）旨在從輸入輸出示例生成算法。這類系統在實踐和理論上都非常重要：從最終用戶的角度來看，它們被部署到數百萬人手中；從人工智能的角度來看，PBE對應於一種非常一般的少樣本歸納推斷形式。鑒於大型語言模型（LLMs）在代碼生成任務中取得的成功，我們在這裡研究LLMs在多大程度上可以說已經“解決”了PBE。我們在經典領域（如列表和字符串）以及在典型預訓練數據中未很好代表的不常見的圖形編程領域進行實驗。我們發現預訓練模型在PBE方面效果不佳，但可以進行微調以獲得更高性能，前提是測試問題是在分佈內的。我們從實證分析了導致這些模型成功和失敗的原因，並朝著了解如何實現更好的分佈外泛化邁出了步伐。總的來說，這些結果表明LLMs在解決典型的PBE任務方面取得了重大進展，潛在地提高了PBE系統的靈活性和適用性，同時也確定了LLMs仍存在不足之處的方面。

English

Programming-by-Examples (PBE) aims to generate an algorithm from input-output examples. Such systems are practically and theoretically important: from an end-user perspective, they are deployed to millions of people, and from an AI perspective, PBE corresponds to a very general form of few-shot inductive inference. Given the success of Large Language Models (LLMs) in code-generation tasks, we investigate here the extent to which LLMs can be said to have `solved' PBE. We experiment on classic domains such as lists and strings, and an uncommon graphics programming domain not well represented in typical pretraining data. We find that pretrained models are not effective at PBE, but that they can be fine-tuned for much higher performance, provided the test problems are in-distribution. We analyze empirically what causes these models to succeed and fail, and take steps toward understanding how to achieve better out-of-distribution generalization. Collectively these results suggest that LLMs make strong progress toward solving the typical suite of PBE tasks, potentially increasing the flexibility and applicability of PBE systems, while also identifying ways in which LLMs still fall short.

LLMs 能否解決以範例編程的問題？

Is Programming by Example solved by LLMs?

摘要

Support