ChatPaper.aiChatPaper

當「正確」不再安全:我們能信任由代碼代理生成的功能性正確補丁嗎?

When "Correct" Is Not Safe: Can We Trust Functionally Correct Patches Generated by Code Agents?

October 15, 2025
作者: Yibo Peng, James Song, Lei Li, Xinyu Yang, Mihai Christodorescu, Ravi Mangal, Corina Pasareanu, Haizhong Zheng, Beidi Chen
cs.AI

摘要

在诸如GitHub等平台上,代码代理被日益信赖以自主修复漏洞,然而对其安全性的评估几乎完全集中于功能正确性。本文揭示了一种针对现实世界代码代理的新型威胁:功能正确但存在漏洞(FCV)的补丁,这些补丁能通过所有测试案例却包含易受攻击的代码。通过我们提出的FCV攻击——该攻击可由恶意攻击者精心设计或由善意开发者无意引入——我们展示了包括ChatGPT和Claude在内的最先进大型语言模型(SOTA LLMs),以及SWE-agent和OpenHands等代理框架,均对此FCV威胁无免疫;在SWE-Bench上的12种代理-模型组合中,攻击仅需对代码代理进行黑箱访问及单次查询即可实施。例如,针对CWE-538(信息泄露漏洞),FCV攻击在GPT-5 Mini + OpenHands组合上达到了40.7%的攻击成功率。我们的研究结果揭示了当前评估范式所忽视的一项重要安全威胁,并呼吁开发具备安全意识的防御机制以保护代码代理。
English
Code agents are increasingly trusted to autonomously fix bugs on platforms such as GitHub, yet their security evaluation focuses almost exclusively on functional correctness. In this paper, we reveal a novel type of threat to real-world code agents: Functionally Correct yet Vulnerable (FCV) patches, which pass all test cases but contain vulnerable code. With our proposed FCV-Attack, which can be deliberately crafted by malicious attackers or implicitly introduced by benign developers, we show that SOTA LLMs (e.g., ChatGPT and Claude) and agent scaffolds (e.g., SWE-agent and OpenHands) are all vulnerable to this FCV threat; across 12 agent-model combinations on SWE-Bench, the attack only requires black-box access and a single query to the code agent to perform the attack. For example, for CWE-538 (information exposure vulnerability), the FCV-Attack attains an attack success rate of 40.7% on GPT-5 Mini + OpenHands. Our results reveal an important security threat overlooked by current evaluation paradigms and urge the development of security-aware defenses for code agents.
PDF32October 22, 2025