プログラミング・バイ・エグザンプルはLLMによって解決されたのか？

要旨

プログラミング・バイ・エグザンプル（PBE）は、入力と出力の例からアルゴリズムを生成することを目的としています。このようなシステムは、実用的にも理論的にも重要です。エンドユーザーの観点からは、何百万人もの人々に展開されており、AIの観点からは、PBEは非常に一般的な形式の少数ショット帰納推論に対応しています。大規模言語モデル（LLM）がコード生成タスクで成功を収めていることを踏まえ、本論文ではLLMがPBEを「解決」したと言える程度を調査します。リストや文字列といった古典的なドメインに加え、典型的な事前学習データでは十分に表現されていないグラフィックスプログラミングのドメインについても実験を行います。事前学習済みモデルはPBEにおいて有効ではないものの、テスト問題が分布内にある場合には、ファインチューニングによって大幅に性能を向上させられることがわかりました。これらのモデルが成功する要因と失敗する要因を実証的に分析し、分布外の汎化性能を向上させるための理解に向けて一歩を踏み出します。これらの結果を総合すると、LLMは典型的なPBEタスクの解決に向けて大きな進歩を遂げており、PBEシステムの柔軟性と適用可能性を高める可能性がある一方で、LLMがまだ不足している点も明らかになりました。

English

Programming-by-Examples (PBE) aims to generate an algorithm from input-output examples. Such systems are practically and theoretically important: from an end-user perspective, they are deployed to millions of people, and from an AI perspective, PBE corresponds to a very general form of few-shot inductive inference. Given the success of Large Language Models (LLMs) in code-generation tasks, we investigate here the extent to which LLMs can be said to have `solved' PBE. We experiment on classic domains such as lists and strings, and an uncommon graphics programming domain not well represented in typical pretraining data. We find that pretrained models are not effective at PBE, but that they can be fine-tuned for much higher performance, provided the test problems are in-distribution. We analyze empirically what causes these models to succeed and fail, and take steps toward understanding how to achieve better out-of-distribution generalization. Collectively these results suggest that LLMs make strong progress toward solving the typical suite of PBE tasks, potentially increasing the flexibility and applicability of PBE systems, while also identifying ways in which LLMs still fall short.

プログラミング・バイ・エグザンプルはLLMによって解決されたのか？

Is Programming by Example solved by LLMs?

要旨

Support