CodeChain: 代表的なサブモジュールを用いた自己修正チェーンによるモジュール型コード生成の実現に向けて

要旨

大規模言語モデル（LLM）は、HumanEvalやMBPPベンチマークのような単純なプログラミングタスクを解決するのに既にかなり熟達しています。しかし、より複雑で競争力のあるプログラミングタスクを解決することは、これらのモデルにとって依然として非常に困難です。これは、モデルが解決策を単一のコードブロックとして生成する傾向があり、論理的なサブタスクやサブモジュールに分解しないためかもしれません。一方、経験豊富なプログラマーは、複雑なタスクを解決するために、抽象化を伴うモジュール化されたコードを直感的に記述し、しばしば以前に開発されたモジュールを再利用します。このギャップを埋めるために、我々はCodeChainという新しい推論フレームワークを提案します。CodeChainは、自己修正の連鎖を通じてモジュール化されたコード生成を促し、各反復で生成された代表的なサブモジュールに導かれます。具体的には、CodeChainはまずLLMにチェーン・オブ・シンクト（chain-of-thought）プロンプティングを通じてモジュール化されたコードを生成するよう指示します。その後、以下の2つのステップを反復して自己修正の連鎖を適用します：1）生成されたサブモジュールを抽出してクラスタリングし、より汎用的で再利用可能な実装としてクラスターの代表を選択し、2）これらの選択されたモジュール実装を元のチェーン・オブ・シンクトプロンプトに追加し、LLMに新しいモジュール化された解決策を再生成するよう指示します。我々は、CodeChainがLLMに以前に開発され検証されたサブモジュールを自然に再利用するよう促すことで、生成された解決策のモジュール性と正確性の両方を大幅に向上させ、APPSでは35%、CodeContestsでは76%の相対的なpass@1改善を達成できることを発見しました。これは、OpenAIのLLMだけでなく、WizardCoderのようなオープンソースのLLMでも有効であることが示されています。また、プロンプティングの方法、クラスター数、モデルサイズ、プログラム品質などに関する包括的なアブレーションスタディを実施し、CodeChainの成功を支える有用な洞察を提供します。

English

Large Language Models (LLMs) have already become quite proficient at solving simpler programming tasks like those in HumanEval or MBPP benchmarks. However, solving more complex and competitive programming tasks is still quite challenging for these models - possibly due to their tendency to generate solutions as monolithic code blocks instead of decomposing them into logical sub-tasks and sub-modules. On the other hand, experienced programmers instinctively write modularized code with abstraction for solving complex tasks, often reusing previously developed modules. To address this gap, we propose CodeChain, a novel framework for inference that elicits modularized code generation through a chain of self-revisions, each being guided by some representative sub-modules generated in previous iterations. Concretely, CodeChain first instructs the LLM to generate modularized codes through chain-of-thought prompting. Then it applies a chain of self-revisions by iterating the two steps: 1) extracting and clustering the generated sub-modules and selecting the cluster representatives as the more generic and re-usable implementations, and 2) augmenting the original chain-of-thought prompt with these selected module-implementations and instructing the LLM to re-generate new modularized solutions. We find that by naturally encouraging the LLM to reuse the previously developed and verified sub-modules, CodeChain can significantly boost both modularity as well as correctness of the generated solutions, achieving relative pass@1 improvements of 35% on APPS and 76% on CodeContests. It is shown to be effective on both OpenAI LLMs as well as open-sourced LLMs like WizardCoder. We also conduct comprehensive ablation studies with different methods of prompting, number of clusters, model sizes, program qualities, etc., to provide useful insights that underpin CodeChain's success.

CodeChain: 代表的なサブモジュールを用いた自己修正チェーンによるモジュール型コード生成の実現に向けて

CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules

要旨

Support