ChatPaper.aiChatPaper

CodeChain:通過代表性子模塊的自我修訂鏈實現模塊化代碼生成

CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules

October 13, 2023
作者: Hung Le, Hailin Chen, Amrita Saha, Akash Gokul, Doyen Sahoo, Shafiq Joty
cs.AI

摘要

大型語言模型(LLMs)已經相當擅長解決像HumanEval或MBPP基準中的簡單編程任務。然而,解決更複雜和具競爭性的編程任務對這些模型來說仍然相當具挑戰性 - 可能是因為它們傾向於生成作為單塊代碼塊而不是將其分解為邏輯子任務和子模塊的解決方案。另一方面,有經驗的程序員本能地編寫具有抽象的模塊化代碼來解決複雜任務,通常重複使用先前開發的模塊。為了彌補這一差距,我們提出了CodeChain,這是一個新穎的推理框架,通過一系列自我修訂引導生成模塊化代碼,每個修訂都由前幾次迭代中生成的某些代表性子模塊引導。具體來說,CodeChain首先通過一系列思維鏈提示指導LLM生成模塊化代碼。然後通過兩個步驟迭代一系列自我修訂:1)提取和聚類生成的子模塊並選擇叢集代表作為更通用和可重用的實現,以及2)用這些選定的模塊實現擴充原始的思維鏈提示,指導LLM重新生成新的模塊化解決方案。我們發現,通過自然地鼓勵LLM重複使用先前開發和驗證的子模塊,CodeChain可以顯著提升生成解決方案的模塊性和正確性,實現在APPS上的相對pass@1改進35%,在CodeContests上為76%。它被證明對OpenAI LLMs以及像WizardCoder這樣的開源LLMs都是有效的。我們還進行了包括提示方法、叢集數量、模型大小、程序質量等在內的全面消融研究,以提供支持CodeChain成功的有用見解。
English
Large Language Models (LLMs) have already become quite proficient at solving simpler programming tasks like those in HumanEval or MBPP benchmarks. However, solving more complex and competitive programming tasks is still quite challenging for these models - possibly due to their tendency to generate solutions as monolithic code blocks instead of decomposing them into logical sub-tasks and sub-modules. On the other hand, experienced programmers instinctively write modularized code with abstraction for solving complex tasks, often reusing previously developed modules. To address this gap, we propose CodeChain, a novel framework for inference that elicits modularized code generation through a chain of self-revisions, each being guided by some representative sub-modules generated in previous iterations. Concretely, CodeChain first instructs the LLM to generate modularized codes through chain-of-thought prompting. Then it applies a chain of self-revisions by iterating the two steps: 1) extracting and clustering the generated sub-modules and selecting the cluster representatives as the more generic and re-usable implementations, and 2) augmenting the original chain-of-thought prompt with these selected module-implementations and instructing the LLM to re-generate new modularized solutions. We find that by naturally encouraging the LLM to reuse the previously developed and verified sub-modules, CodeChain can significantly boost both modularity as well as correctness of the generated solutions, achieving relative pass@1 improvements of 35% on APPS and 76% on CodeContests. It is shown to be effective on both OpenAI LLMs as well as open-sourced LLMs like WizardCoder. We also conduct comprehensive ablation studies with different methods of prompting, number of clusters, model sizes, program qualities, etc., to provide useful insights that underpin CodeChain's success.
PDF131December 15, 2024