LLMs是否解决了按示例编程的问题?
Is Programming by Example solved by LLMs?
June 12, 2024
作者: Wen-Ding Li, Kevin Ellis
cs.AI
摘要
基于示例的编程(PBE)旨在从输入输出示例中生成算法。这种系统在实践和理论上都非常重要:从最终用户的角度来看,它们被部署到数百万人手中;从人工智能的角度来看,PBE对应于一种非常普遍的少样本归纳推理形式。鉴于大型语言模型(LLMs)在代码生成任务中取得的成功,我们在这里调查LLMs在多大程度上可以说已经“解决”了PBE。我们在经典领域(如列表和字符串)以及在典型预训练数据中未充分代表的不常见的图形编程领域进行实验。我们发现预训练模型在PBE方面并不有效,但可以对其进行微调以获得更高的性能,前提是测试问题属于分布内。我们通过实证分析了导致这些模型成功和失败的原因,并采取措施来了解如何实现更好的分布外泛化。总的来说,这些结果表明LLMs在解决典型的PBE任务方面取得了重大进展,潜在地增加了PBE系统的灵活性和适用性,同时也指出了LLMs仍然存在不足之处。
English
Programming-by-Examples (PBE) aims to generate an algorithm from input-output
examples. Such systems are practically and theoretically important: from an
end-user perspective, they are deployed to millions of people, and from an AI
perspective, PBE corresponds to a very general form of few-shot inductive
inference. Given the success of Large Language Models (LLMs) in code-generation
tasks, we investigate here the extent to which LLMs can be said to have
`solved' PBE. We experiment on classic domains such as lists and strings, and
an uncommon graphics programming domain not well represented in typical
pretraining data. We find that pretrained models are not effective at PBE, but
that they can be fine-tuned for much higher performance, provided the test
problems are in-distribution. We analyze empirically what causes these models
to succeed and fail, and take steps toward understanding how to achieve better
out-of-distribution generalization. Collectively these results suggest that
LLMs make strong progress toward solving the typical suite of PBE tasks,
potentially increasing the flexibility and applicability of PBE systems, while
also identifying ways in which LLMs still fall short.Summary
AI-Generated Summary