以1美元修復7,400個錯誤：低成本崩潰現場程式修復

摘要

隨著漏洞發現技術的快速進步，所發現的漏洞數量已超出開發者能夠合理修復的範圍，這使得對高效自動化程序修復（APR）方法的迫切需求日益凸顯。然而，現代漏洞的複雜性常常使得精確的根因分析變得困難且不可靠。為應對這一挑戰，我們提出了崩潰現場修復方法，以簡化修復任務的同時仍能有效降低被利用的風險。此外，我們引入了一種模板引導的補丁生成方法，該方法在保持效率和有效性的同時，顯著降低了大型語言模型（LLMs）的令牌成本。我們實現了原型系統WILLIAMT，並將其與最先進的APR工具進行了對比評估。結果顯示，當與表現最佳的代理CodeRover-S結合使用時，WILLIAMT在ARVO（一個基於真實開源軟件漏洞的基準測試）上將令牌成本降低了45.9%，並將漏洞修復率提升至73.5%（提高了29.6%）。此外，我們證明了WILLIAMT即便在無法訪問前沿LLMs的情況下也能有效運作：即使在Mac M4 Mini上運行的本地模型也能達到合理的修復率。這些發現凸顯了WILLIAMT廣泛的適用性和可擴展性。

English

The rapid advancement of bug-finding techniques has led to the discovery of more vulnerabilities than developers can reasonably fix, creating an urgent need for effective Automated Program Repair (APR) methods. However, the complexity of modern bugs often makes precise root cause analysis difficult and unreliable. To address this challenge, we propose crash-site repair to simplify the repair task while still mitigating the risk of exploitation. In addition, we introduce a template-guided patch generation approach that significantly reduces the token cost of Large Language Models (LLMs) while maintaining both efficiency and effectiveness. We implement our prototype system, WILLIAMT, and evaluate it against state-of-the-art APR tools. Our results show that, when combined with the top-performing agent CodeRover-S, WILLIAMT reduces token cost by 45.9% and increases the bug-fixing rate to 73.5% (+29.6%) on ARVO, a ground-truth open source software vulnerabilities benchmark. Furthermore, we demonstrate that WILLIAMT can function effectively even without access to frontier LLMs: even a local model running on a Mac M4 Mini achieves a reasonable repair rate. These findings highlight the broad applicability and scalability of WILLIAMT.

以1美元修復7,400個錯誤：低成本崩潰現場程式修復

Fixing 7,400 Bugs for 1$: Cheap Crash-Site Program Repair

摘要

Support