Fact2Fiction: エージェント型ファクトチェックシステムに対する標的型ポイズニング攻撃

要旨

最先端のファクトチェックシステムは、誤情報を大規模に対抗するために、自律的なLLMベースのエージェントを活用して複雑な主張をより小さなサブ主張に分解し、各サブ主張を個別に検証し、部分的な結果を集約して根拠（判定の説明的論理）付きの判定を生成します。これらのシステムのセキュリティは極めて重要であり、見過ごされがちな脆弱性を持つファクトチェッカーが誤情報を増幅する可能性があります。本研究では、このようなエージェント型ファクトチェックシステムを標的とした初のポイズニング攻撃フレームワーク「Fact2Fiction」を提案します。Fact2Fictionは分解戦略を模倣し、システムが生成する根拠を利用して、サブ主張の検証を損なうように調整された悪意のある証拠を作成します。広範な実験により、Fact2Fictionが様々なポイズニング予算において、最先端の攻撃手法よりも8.9％～21.2％高い攻撃成功率を達成することが示されました。Fact2Fictionは、現在のファクトチェックシステムのセキュリティ上の弱点を明らかにし、防御策の必要性を強調しています。

English

State-of-the-art fact-checking systems combat misinformation at scale by employing autonomous LLM-based agents to decompose complex claims into smaller sub-claims, verify each sub-claim individually, and aggregate the partial results to produce verdicts with justifications (explanatory rationales for the verdicts). The security of these systems is crucial, as compromised fact-checkers, which tend to be easily underexplored, can amplify misinformation. This work introduces Fact2Fiction, the first poisoning attack framework targeting such agentic fact-checking systems. Fact2Fiction mirrors the decomposition strategy and exploits system-generated justifications to craft tailored malicious evidences that compromise sub-claim verification. Extensive experiments demonstrate that Fact2Fiction achieves 8.9\%--21.2\% higher attack success rates than state-of-the-art attacks across various poisoning budgets. Fact2Fiction exposes security weaknesses in current fact-checking systems and highlights the need for defensive countermeasures.

Fact2Fiction: エージェント型ファクトチェックシステムに対する標的型ポイズニング攻撃

Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System

要旨

Support