BatCoder: 逆翻訳による自己教師型双方向コード-ドキュメント学習

要旨

コード関連タスクにおける大規模言語モデルの学習は、通常、高品質なコードとドキュメントのペアに依存しているが、こうしたデータセットの整備にはコストがかかり、特にニッチなプログラミング言語では不足しがちである。本研究では、コード生成とドキュメント生成を共同で最適化するように設計された、自己教師型強化学習フレームワーク「BatCoder」を提案する。BatCoderは逆翻訳戦略を採用しており、まずコードからドキュメントを生成し、次に生成されたドキュメントを用いて元のコードを再構築する。元のコードと再構築されたコード間の意味的類似度が暗黙的な報酬として機能し、強化学習を通じて、ドキュメントからのコード生成、およびその逆のタスクにおけるモデルの性能向上を可能にする。この手法により、コードのみを用いてモデルを学習でき、利用可能な訓練事例を大幅に増加させることができる。7Bパラメータモデルを用いたHumanEvalおよびMBPPでの評価では、BatCoderはそれぞれ83.5%、81.0%のpass@1を達成し、強力なオープンソースベースラインを上回った。さらに、本フレームワークは、訓練データサイズとモデル容量の両方に対して一貫したスケーリング特性を示す。

English

Training LLMs for code-related tasks typically depends on high-quality code-documentation pairs, which are costly to curate and often scarce for niche programming languages. We introduce BatCoder, a self-supervised reinforcement learning framework designed to jointly optimize code generation and documentation production. BatCoder employs a back-translation strategy: a documentation is first generated from code, and then the generated documentation is used to reconstruct the original code. The semantic similarity between the original and reconstructed code serves as an implicit reward, enabling reinforcement learning to improve the model's performance both in generating code from documentation and vice versa. This approach allows models to be trained using only code, substantially increasing the available training examples. Evaluated on HumanEval and MBPP with a 7B model, BatCoder achieved 83.5% and 81.0% pass@1, outperforming strong open-source baselines. Moreover, the framework demonstrates consistent scaling with respect to both training corpus size and model capacity.

BatCoder: 逆翻訳による自己教師型双方向コード-ドキュメント学習

BatCoder: Self-Supervised Bidirectional Code-Documentation Learning via Back-Translation

要旨

Support