CodeChain：通过具有代表性子模块的自我修订链实现模块化代码生成

摘要

大型语言模型（LLMs）已经在解决类似HumanEval或MBPP基准测试中的简单编程任务方面表现得相当熟练。然而，解决更复杂和具有竞争性的编程任务对这些模型来说仍然是相当具有挑战性的 - 可能是因为它们倾向于生成作为整体代码块的解决方案，而不是将其分解为逻辑子任务和子模块。另一方面，有经验的程序员会本能地编写带有抽象的模块化代码来解决复杂任务，通常会重复使用先前开发的模块。为了弥补这一差距，我们提出了CodeChain，这是一个通过一系列自我修订引导模块化代码生成的新颖框架，每个修订都由前几次迭代中生成的一些代表性子模块引导。具体来说，CodeChain首先通过一系列思维链提示指导LLM生成模块化代码。然后，通过迭代两个步骤来应用一系列自我修订：1）提取和聚类生成的子模块，并选择聚类代表作为更通用和可重复使用的实现，2）利用这些选定的模块实现扩充原始的思维链提示，并指导LLM重新生成新的模块化解决方案。我们发现，通过自然地鼓励LLM重复使用先前开发和验证的子模块，CodeChain可以显著提升生成解决方案的模块化程度和正确性，实现在APPS上相对pass@1改进35％，在CodeContests上为76％。它在OpenAI LLMs以及开源LLMs如WizardCoder上都表现有效。我们还进行了全面的消融研究，涉及提示方法、聚类数量、模型大小、程序质量等不同方面，以提供支持CodeChain成功的有用见解。

English

Large Language Models (LLMs) have already become quite proficient at solving simpler programming tasks like those in HumanEval or MBPP benchmarks. However, solving more complex and competitive programming tasks is still quite challenging for these models - possibly due to their tendency to generate solutions as monolithic code blocks instead of decomposing them into logical sub-tasks and sub-modules. On the other hand, experienced programmers instinctively write modularized code with abstraction for solving complex tasks, often reusing previously developed modules. To address this gap, we propose CodeChain, a novel framework for inference that elicits modularized code generation through a chain of self-revisions, each being guided by some representative sub-modules generated in previous iterations. Concretely, CodeChain first instructs the LLM to generate modularized codes through chain-of-thought prompting. Then it applies a chain of self-revisions by iterating the two steps: 1) extracting and clustering the generated sub-modules and selecting the cluster representatives as the more generic and re-usable implementations, and 2) augmenting the original chain-of-thought prompt with these selected module-implementations and instructing the LLM to re-generate new modularized solutions. We find that by naturally encouraging the LLM to reuse the previously developed and verified sub-modules, CodeChain can significantly boost both modularity as well as correctness of the generated solutions, achieving relative pass@1 improvements of 35% on APPS and 76% on CodeContests. It is shown to be effective on both OpenAI LLMs as well as open-sourced LLMs like WizardCoder. We also conduct comprehensive ablation studies with different methods of prompting, number of clusters, model sizes, program qualities, etc., to provide useful insights that underpin CodeChain's success.

CodeChain：通过具有代表性子模块的自我修订链实现模块化代码生成

CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules

摘要

Support