Repair-R1: 수리 전 더 나은 테스트

초록

APR(Automated Program Repair)은 프로그램 결함을 자동으로 탐지하고, 패치를 생성하며, 수정 사항을 검증하는 것을 목표로 합니다. 기존의 APR 기술은 종종 LLM(Large Language Models)과 결합되어, LLM의 코드 관련 지식을 활용하여 수정 효과를 개선합니다. 현재의 LLM 기반 APR 방법은 일반적으로 추론 단계에서만 테스트 케이스를 사용하며, 먼저 수정을 수행한 후 테스트 실행을 통해 검증하는 반복적 접근 방식을 채택합니다. 이 전통적인 패러다임은 두 가지 중요한 측면을 간과합니다: 훈련 단계에서 테스트 케이스의 잠재적 기여와, 수정 전에 테스트를 활용할 가능성입니다. 이를 해결하기 위해, 우리는 테스트 케이스를 모델의 훈련 단계에 도입하고, 테스트 생성을 수정보다 앞서도록 하는 Repair-R1을 제안합니다. 이 모델은 먼저 결함 동작을 구별할 수 있는 판별적 테스트 케이스를 생성한 후, 이러한 테스트를 기반으로 수정을 수행해야 합니다. 이를 통해 모델이 결함을 더 잘 탐지하고 결함의 근본 원인을 이해할 수 있게 되어, 수정 효과가 개선됩니다. 우리는 Repair-R1을 세 가지 다른 백본 모델로 구현하고, RL(강화 학습)을 사용하여 테스트 생성과 버그 수정을 공동 최적화합니다. 널리 사용되는 네 가지 벤치마크에서의 실험 결과는 Repair-R1의 우수성을 입증합니다. 특히, 기본 모델과 비교하여 Repair-R1은 수정 성공률을 2.68\%에서 48.29\%까지, 테스트 생성 성공률을 16.38\%에서 53.28\%까지, 테스트 커버리지를 0.78\%에서 53.96\%까지 개선합니다. 우리는 코드와 가중치를 https://github.com/Tomsawyerhu/APR-RL과 https://huggingface.co/tomhu/Qwen3-4B-RL-5000-step에 공개합니다.

English

APR (Automated Program Repair) aims to automatically locate program defects, generate patches and validate the repairs. Existing techniques for APR are often combined with LLMs (Large Language Models), which leverages the code-related knowledge of LLMs to improve repair effectiveness. Current LLM-based APR methods typically utilize test cases only during the inference stage, adopting an iterative approach that performs repair first and validates it through test execution afterward. This conventional paradigm neglects two important aspects: the potential contribution of test cases in the training phase, and the possibility of leveraging testing prior to repair. To address this, we propose Repair-R1, which introduces test cases into the model's training phase and shifts test generation to precede repair. The model is required to first generate discriminative test cases that can distinguish defective behaviors, and then perform repair based on these tests. This enables the model to better locate defects and understand the underlying causes of defects, thereby improving repair effectiveness. We implement Repair-R1 with three different backbone models, using RL (reinforcement learning) to co-optimize test generation and bug repair. Experimental results on four widely adopted benchmarks demonstrate the superiority of Repair-R1. Specially, compared to vanilla models, Repair-R1 improves repair success rate by 2.68\% to 48.29\%, test generation success rate by 16.38\% to 53.28\%, and test coverage by 0.78\% to 53.96\%. We publish the code and weights at https://github.com/Tomsawyerhu/APR-RL and https://huggingface.co/tomhu/Qwen3-4B-RL-5000-step.

Repair-R1: 수리 전 더 나은 테스트

Repair-R1: Better Test Before Repair

초록

Support