ReGAL：重构程序以发现可泛化抽象

摘要

尽管大型语言模型（LLMs）越来越多地用于程序合成，但它们缺乏开发有用抽象所需的全局视图；它们通常一次预测一个程序，经常重复相同的功能。从头开始生成冗余代码既低效又容易出错。为了解决这个问题，我们提出了用于通用抽象学习的重构（ReGAL）方法，这是一种无梯度方法，通过代码重构学习可重用函数库，即重构代码而不改变其执行输出。ReGAL从一小组现有程序中学习，通过执行迭代验证和完善其抽象。我们发现，ReGAL发现的共享函数库使得跨不同领域的程序更容易预测。在三个数据集（LOGO图形生成、日期推理和TextCraft，一个基于Minecraft的文本游戏）上，使用ReGAL函数预测程序时，无论是开源还是专有的LLMs，在准确性上都有所提高。对于CodeLlama-13B，ReGAL在图形方面的绝对准确率提高了11.5%，日期理解提高了26.1%，TextCraft提高了8.1%，在三个领域中有两个超越了GPT-3.5。我们的分析揭示了ReGAL的抽象封装了频繁使用的子程序以及环境动态。

English

While large language models (LLMs) are increasingly being used for program synthesis, they lack the global view needed to develop useful abstractions; they generally predict programs one at a time, often repeating the same functionality. Generating redundant code from scratch is both inefficient and error-prone. To address this, we propose Refactoring for Generalizable Abstraction Learning (ReGAL), a gradient-free method for learning a library of reusable functions via code refactorization, i.e. restructuring code without changing its execution output. ReGAL learns from a small set of existing programs, iteratively verifying and refining its abstractions via execution. We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains. On three datasets (LOGO graphics generation, Date reasoning, and TextCraft, a Minecraft-based text game), both open-source and proprietary LLMs improve in accuracy when predicting programs with ReGAL functions. For CodeLlama-13B, ReGAL results in absolute accuracy increases of 11.5% on graphics, 26.1% on date understanding, and 8.1% on TextCraft, outperforming GPT-3.5 in two of three domains. Our analysis reveals ReGAL's abstractions encapsulate frequently-used subroutines as well as environment dynamics.

ReGAL：重构程序以发现可泛化抽象

ReGAL: Refactoring Programs to Discover Generalizable Abstractions

摘要

Support