BatCoder: 역번역을 통한 자기 지도 양방향 코드-문서 학습

초록

코드 관련 작업을 위한 대규모 언어 모델(LLM) 학습은 일반적으로 고품질 코드-문서 쌍에 의존하는데, 이러한 데이터는 수집 비용이 높으며 특히 니치 프로그래밍 언어의 경우 흔히 부족한 실정입니다. 본 연구에서는 코드 생성과 문서 생성을 공동으로 최적화하도록 설계된 자기 지도 강화 학습 프레임워크인 BatCoder를 소개합니다. BatCoder는 역번역 전략을 활용합니다. 먼저 코드에서 문서를 생성한 다음, 생성된 문서를 사용하여 원본 코드를 재구성합니다. 원본 코드와 재구성된 코드 간의 의미론적 유사도는 암시적 보상으로 작용하여, 강화 학습을 통해 문서에서 코드를 생성하는 작업과 그 반대 작업 모두에서 모델 성능을 향상시킵니다. 이 접근법을 통해 코드만으로 모델을 학습시킬 수 있어 활용 가능한 학습 예시를 크게 증가시킬 수 있습니다. 7B 매개변수 모델을 사용하여 HumanEval과 MBPP에서 평가한 결과, BatCoder는 각각 83.5%와 81.0%의 pass@1 성능을 달성하여 강력한 오픈소스 기준 모델들을 능가했습니다. 더불어, 해당 프레임워크는 학습 데이터 규모와 모델 용량 측면에서 모두 일관된 성능 확장성을 보여주었습니다.

English

Training LLMs for code-related tasks typically depends on high-quality code-documentation pairs, which are costly to curate and often scarce for niche programming languages. We introduce BatCoder, a self-supervised reinforcement learning framework designed to jointly optimize code generation and documentation production. BatCoder employs a back-translation strategy: a documentation is first generated from code, and then the generated documentation is used to reconstruct the original code. The semantic similarity between the original and reconstructed code serves as an implicit reward, enabling reinforcement learning to improve the model's performance both in generating code from documentation and vice versa. This approach allows models to be trained using only code, substantially increasing the available training examples. Evaluated on HumanEval and MBPP with a 7B model, BatCoder achieved 83.5% and 81.0% pass@1, outperforming strong open-source baselines. Moreover, the framework demonstrates consistent scaling with respect to both training corpus size and model capacity.

BatCoder: 역번역을 통한 자기 지도 양방향 코드-문서 학습

BatCoder: Self-Supervised Bidirectional Code-Documentation Learning via Back-Translation

초록

Support