ChatPaper.aiChatPaper

事實至虛構:針對主動式事實核查系統的定向投毒攻擊

Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System

August 8, 2025
作者: Haorui He, Yupeng Li, Bin Benjamin Zhu, Dacheng Wen, Reynold Cheng, Francis C. M. Lau
cs.AI

摘要

最先進的事實查核系統通過部署基於大型語言模型(LLM)的自動化代理,將複雜的聲明分解為更小的子聲明,逐一驗證每個子聲明,並匯總部分結果以生成帶有解釋(即判決的解釋性理由)的結論,從而大規模地對抗虛假信息。這些系統的安全性至關重要,因為被攻破的事實查核系統往往容易被忽視,反而可能助長虛假信息的傳播。本研究提出了Fact2Fiction,這是首個針對此類代理式事實查核系統的投毒攻擊框架。Fact2Fiction模仿了系統的分解策略,並利用系統生成的解釋來精心設計針對性的惡意證據,從而破壞子聲明的驗證過程。大量實驗表明,Fact2Fiction在不同投毒預算下,其攻擊成功率比現有最先進的攻擊高出8.9%至21.2%。Fact2Fiction揭示了當前事實查核系統的安全漏洞,並強調了採取防禦性對策的必要性。
English
State-of-the-art fact-checking systems combat misinformation at scale by employing autonomous LLM-based agents to decompose complex claims into smaller sub-claims, verify each sub-claim individually, and aggregate the partial results to produce verdicts with justifications (explanatory rationales for the verdicts). The security of these systems is crucial, as compromised fact-checkers, which tend to be easily underexplored, can amplify misinformation. This work introduces Fact2Fiction, the first poisoning attack framework targeting such agentic fact-checking systems. Fact2Fiction mirrors the decomposition strategy and exploits system-generated justifications to craft tailored malicious evidences that compromise sub-claim verification. Extensive experiments demonstrate that Fact2Fiction achieves 8.9\%--21.2\% higher attack success rates than state-of-the-art attacks across various poisoning budgets. Fact2Fiction exposes security weaknesses in current fact-checking systems and highlights the need for defensive countermeasures.
PDF42August 12, 2025