修理前の改善テスト

要旨

APR（Automated Program Repair）は、プログラムの欠陥を自動的に特定し、パッチを生成し、修復を検証することを目的としています。既存のAPR技術は、しばしばLLM（Large Language Models）と組み合わせられ、LLMのコード関連知識を活用して修復効果を向上させます。現在のLLMベースのAPR手法は、推論段階でのみテストケースを利用し、まず修復を行い、その後テスト実行を通じて検証する反復的アプローチを採用しています。この従来のパラダイムは、トレーニング段階でのテストケースの潜在的な貢献と、修復前にテストを活用する可能性という2つの重要な側面を無視しています。これを解決するため、我々はRepair-R1を提案します。Repair-R1は、モデルのトレーニング段階にテストケースを導入し、テスト生成を修復の前に移行します。モデルはまず、欠陥のある動作を識別できる識別力のあるテストケースを生成し、その後これらのテストに基づいて修復を行う必要があります。これにより、モデルは欠陥をより正確に特定し、欠陥の根本原因を理解することができ、修復効果が向上します。我々はRepair-R1を3つの異なるバックボーンモデルで実装し、RL（強化学習）を使用してテスト生成とバグ修復を共に最適化します。4つの広く採用されているベンチマークでの実験結果は、Repair-R1の優位性を示しています。特に、バニラモデルと比較して、Repair-R1は修復成功率を2.68\%から48.29\%、テスト生成成功率を16.38\%から53.28\%、テストカバレッジを0.78\%から53.96\%向上させます。コードと重みはhttps://github.com/Tomsawyerhu/APR-Rlとhttps://huggingface.co/tomhu/Qwen3-4B-RL-5000-stepで公開しています。

English

APR (Automated Program Repair) aims to automatically locate program defects, generate patches and validate the repairs. Existing techniques for APR are often combined with LLMs (Large Language Models), which leverages the code-related knowledge of LLMs to improve repair effectiveness. Current LLM-based APR methods typically utilize test cases only during the inference stage, adopting an iterative approach that performs repair first and validates it through test execution afterward. This conventional paradigm neglects two important aspects: the potential contribution of test cases in the training phase, and the possibility of leveraging testing prior to repair. To address this, we propose Repair-R1, which introduces test cases into the model's training phase and shifts test generation to precede repair. The model is required to first generate discriminative test cases that can distinguish defective behaviors, and then perform repair based on these tests. This enables the model to better locate defects and understand the underlying causes of defects, thereby improving repair effectiveness. We implement Repair-R1 with three different backbone models, using RL (reinforcement learning) to co-optimize test generation and bug repair. Experimental results on four widely adopted benchmarks demonstrate the superiority of Repair-R1. Specially, compared to vanilla models, Repair-R1 improves repair success rate by 2.68\% to 48.29\%, test generation success rate by 16.38\% to 53.28\%, and test coverage by 0.78\% to 53.96\%. We publish the code and weights at https://github.com/Tomsawyerhu/APR-RL and https://huggingface.co/tomhu/Qwen3-4B-RL-5000-step.

修理前の改善テスト

Repair-R1: Better Test Before Repair

要旨

Support