BatCoder：基于回译的代码-文档双向自监督学习框架

摘要

针对代码相关任务的大语言模型训练通常依赖于高质量的代码-文档配对数据，但这些数据不仅整理成本高昂，在冷门编程语言中更是稀缺资源。我们提出BatCoder——一种自监督强化学习框架，通过联合优化代码生成与文档生成任务。该框架采用回译策略：首先生成代码对应的文档，再基于生成文档重构原始代码。原始代码与重构代码之间的语义相似度作为隐式奖励信号，通过强化学习机制双向提升模型从代码生成文档和从文档生成代码的能力。这种方法仅需代码数据即可完成训练，显著扩充了可用训练样本规模。在HumanEval和MBPP基准测试中，基于70亿参数模型的BatCoder分别达到83.5%和81.0%的pass@1准确率，优于现有主流开源基线模型。此外，该框架在训练数据规模和模型容量维度均展现出良好的扩展性。

English

Training LLMs for code-related tasks typically depends on high-quality code-documentation pairs, which are costly to curate and often scarce for niche programming languages. We introduce BatCoder, a self-supervised reinforcement learning framework designed to jointly optimize code generation and documentation production. BatCoder employs a back-translation strategy: a documentation is first generated from code, and then the generated documentation is used to reconstruct the original code. The semantic similarity between the original and reconstructed code serves as an implicit reward, enabling reinforcement learning to improve the model's performance both in generating code from documentation and vice versa. This approach allows models to be trained using only code, substantially increasing the available training examples. Evaluated on HumanEval and MBPP with a 7B model, BatCoder achieved 83.5% and 81.0% pass@1, outperforming strong open-source baselines. Moreover, the framework demonstrates consistent scaling with respect to both training corpus size and model capacity.

BatCoder：基于回译的代码-文档双向自监督学习框架

BatCoder: Self-Supervised Bidirectional Code-Documentation Learning via Back-Translation

摘要

Support