BatCoder：基于反向翻译的代码-文档双向自监督学习框架

摘要

针对代码相关任务训练大语言模型通常依赖高质量的代码-文档对，这类数据不仅标注成本高昂，在特定编程语言中往往极为稀缺。我们提出BatCoder——一种自监督强化学习框架，通过联合优化代码生成与文档生成任务。该框架采用回译策略：首先从代码生成文档，再利用生成的文档重构原始代码。原始代码与重构代码之间的语义相似度作为隐式奖励信号，通过强化学习同步提升模型从文档生成代码和从代码生成文档的能力。这种方法仅需代码数据即可完成训练，显著扩大了可用训练样本规模。在HumanEval和MBPP基准测试中，基于7B参数的BatCoder模型分别达到83.5%和81.0%的pass@1准确率，优于现有强开源基线模型。此外，该框架在训练数据规模和模型容量方面均展现出良好的扩展性。

English

Training LLMs for code-related tasks typically depends on high-quality code-documentation pairs, which are costly to curate and often scarce for niche programming languages. We introduce BatCoder, a self-supervised reinforcement learning framework designed to jointly optimize code generation and documentation production. BatCoder employs a back-translation strategy: a documentation is first generated from code, and then the generated documentation is used to reconstruct the original code. The semantic similarity between the original and reconstructed code serves as an implicit reward, enabling reinforcement learning to improve the model's performance both in generating code from documentation and vice versa. This approach allows models to be trained using only code, substantially increasing the available training examples. Evaluated on HumanEval and MBPP with a 7B model, BatCoder achieved 83.5% and 81.0% pass@1, outperforming strong open-source baselines. Moreover, the framework demonstrates consistent scaling with respect to both training corpus size and model capacity.

BatCoder：基于反向翻译的代码-文档双向自监督学习框架

BatCoder: Self-Supervised Bidirectional Code-Documentation Learning via Back-Translation

摘要

Support