ChatPaper.aiChatPaper

正如人类需要疫苗,模型亦需免疫:通过模型免疫策略对抗虚假信息

Just as Humans Need Vaccines, So Do Models: Model Immunization to Combat Falsehoods

May 23, 2025
作者: Shaina Raza, Rizwan Qureshi, Marcelo Lotif, Aman Chadha, Deval Pandya, Christos Emmanouilidis
cs.AI

摘要

生成式AI模型常常会学习并重现其训练语料库中存在的错误信息。本立场文件提出,类似于生物免疫中通过可控接触减毒病原体来建立免疫力的机制,AI模型应在小规模、隔离的明确标注错误信息集上进行微调,以此作为对抗虚假信息的“疫苗”。这些精心挑选的错误样本在微调过程中定期注入,增强模型识别和拒绝误导性主张的能力,同时保持对真实输入的准确性。一项示范性案例研究表明,经过免疫处理的模型生成的错误信息显著少于基线模型。据我们所知,这是首个将经过事实核查的错误信息本身作为监督式疫苗的训练框架,而非依赖输入扰动或通用的人类反馈信号,以增强模型对未来虚假信息的抵抗力。我们还概述了确保错误数据安全使用的伦理保障和治理控制措施。模型免疫为将AI系统与事实性对齐提供了一种前瞻性范式。
English
Generative AI models often learn and reproduce false information present in their training corpora. This position paper argues that, analogous to biological immunization, where controlled exposure to a weakened pathogen builds immunity, AI models should be fine tuned on small, quarantined sets of explicitly labeled falsehoods as a "vaccine" against misinformation. These curated false examples are periodically injected during finetuning, strengthening the model ability to recognize and reject misleading claims while preserving accuracy on truthful inputs. An illustrative case study shows that immunized models generate substantially less misinformation than baselines. To our knowledge, this is the first training framework that treats fact checked falsehoods themselves as a supervised vaccine, rather than relying on input perturbations or generic human feedback signals, to harden models against future misinformation. We also outline ethical safeguards and governance controls to ensure the safe use of false data. Model immunization offers a proactive paradigm for aligning AI systems with factuality.

Summary

AI-Generated Summary

PDF52May 29, 2025