ChatPaper.aiChatPaper

事实转虚构:针对主动性事实核查系统的定向投毒攻击

Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System

August 8, 2025
作者: Haorui He, Yupeng Li, Bin Benjamin Zhu, Dacheng Wen, Reynold Cheng, Francis C. M. Lau
cs.AI

摘要

当前最先进的辟谣系统通过部署基于大型语言模型(LLM)的自主代理,将复杂声明分解为更小的子声明,逐一验证每个子声明,并汇总部分结果以生成带有解释性理由的判定。这些系统的安全性至关重要,因为易被忽视的受损辟谣系统可能会加剧错误信息的传播。本研究首次提出了针对此类代理辟谣系统的投毒攻击框架——Fact2Fiction。Fact2Fiction模仿了系统的分解策略,并利用系统生成的解释来精心制作定制的恶意证据,从而破坏子声明的验证过程。大量实验表明,在不同的投毒预算下,Fact2Fiction的攻击成功率比现有最先进的攻击高出8.9%至21.2%。Fact2Fiction揭示了当前辟谣系统的安全漏洞,强调了采取防御措施的必要性。
English
State-of-the-art fact-checking systems combat misinformation at scale by employing autonomous LLM-based agents to decompose complex claims into smaller sub-claims, verify each sub-claim individually, and aggregate the partial results to produce verdicts with justifications (explanatory rationales for the verdicts). The security of these systems is crucial, as compromised fact-checkers, which tend to be easily underexplored, can amplify misinformation. This work introduces Fact2Fiction, the first poisoning attack framework targeting such agentic fact-checking systems. Fact2Fiction mirrors the decomposition strategy and exploits system-generated justifications to craft tailored malicious evidences that compromise sub-claim verification. Extensive experiments demonstrate that Fact2Fiction achieves 8.9\%--21.2\% higher attack success rates than state-of-the-art attacks across various poisoning budgets. Fact2Fiction exposes security weaknesses in current fact-checking systems and highlights the need for defensive countermeasures.
PDF42August 12, 2025