CodeIt：具有优先顺位回放的自我改进语言模型

摘要

大型语言模型越来越能够解决通常被认为需要人类水平推理能力的任务。然而，这些模型在诸如抽象和推理语料库（ARC）等智能总体基准测试中的表现仍然非常糟糕。在本文中，我们将ARC视为一个编程通过示例问题，并引入了一种名为代码迭代（CodeIt）的新颖且可扩展的语言模型自我改进方法。我们的方法在程序抽样和事后重新标记以及从优先经验重放中学习之间进行迭代。通过将一个情节的目标（即给定输入的目标程序输出）重新标记为抽样程序产生的实际输出，我们的方法有效地处理了程序合成中奖励的极端稀疏性。将CodeIt应用于ARC数据集，我们展示了优先事后重放，以及预训练和数据增强，导致成功的跨任务泛化。CodeIt是首个能够扩展到完整ARC评估数据集的神经符号方法。我们的方法解决了ARC评估任务中的15％，取得了最先进的性能，并优于现有的神经和符号基线。

English

Large language models are increasingly solving tasks that are commonly believed to require human-level reasoning ability. However, these models still perform very poorly on benchmarks of general intelligence such as the Abstraction and Reasoning Corpus (ARC). In this paper, we approach ARC as a programming-by-examples problem, and introduce a novel and scalable method for language model self-improvement called Code Iteration (CodeIt). Our method iterates between 1) program sampling and hindsight relabeling, and 2) learning from prioritized experience replay. By relabeling the goal of an episode (i.e., the target program output given input) to the realized output produced by the sampled program, our method effectively deals with the extreme sparsity of rewards in program synthesis. Applying CodeIt to the ARC dataset, we demonstrate that prioritized hindsight replay, along with pre-training and data-augmentation, leads to successful inter-task generalization. CodeIt is the first neuro-symbolic approach that scales to the full ARC evaluation dataset. Our method solves 15% of ARC evaluation tasks, achieving state-of-the-art performance and outperforming existing neural and symbolic baselines.

CodeIt：具有优先顺位回放的自我改进语言模型

CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay

摘要

Support