正如人類需要疫苗,模型亦需免疫:以模型免疫對抗虛假信息
Just as Humans Need Vaccines, So Do Models: Model Immunization to Combat Falsehoods
May 23, 2025
作者: Shaina Raza, Rizwan Qureshi, Marcelo Lotif, Aman Chadha, Deval Pandya, Christos Emmanouilidis
cs.AI
摘要
生成式AI模型經常學習並重現其訓練語料庫中存在的錯誤資訊。本立場文件主張,類似於生物免疫接種中通過受控接觸減弱病原體來建立免疫力,AI模型應在少量、隔離的明確標記的虛假資訊集上進行微調,作為對抗錯誤資訊的「疫苗」。這些精心策劃的虛假示例在微調過程中定期注入,增強模型識別和拒絕誤導性主張的能力,同時保持對真實輸入的準確性。一個示範性案例研究表明,接種疫苗的模型生成的錯誤資訊顯著少於基準模型。據我們所知,這是第一個將經過事實核查的虛假資訊本身視為監督疫苗的訓練框架,而非依賴輸入擾動或通用的人類反饋信號,來強化模型以抵禦未來的錯誤資訊。我們還概述了確保虛假數據安全使用的倫理保障和治理控制措施。模型免疫接種為使AI系統與事實性保持一致提供了一種主動的範式。
English
Generative AI models often learn and reproduce false information present in
their training corpora. This position paper argues that, analogous to
biological immunization, where controlled exposure to a weakened pathogen
builds immunity, AI models should be fine tuned on small, quarantined sets of
explicitly labeled falsehoods as a "vaccine" against misinformation. These
curated false examples are periodically injected during finetuning,
strengthening the model ability to recognize and reject misleading claims while
preserving accuracy on truthful inputs. An illustrative case study shows that
immunized models generate substantially less misinformation than baselines. To
our knowledge, this is the first training framework that treats fact checked
falsehoods themselves as a supervised vaccine, rather than relying on input
perturbations or generic human feedback signals, to harden models against
future misinformation. We also outline ethical safeguards and governance
controls to ensure the safe use of false data. Model immunization offers a
proactive paradigm for aligning AI systems with factuality.Summary
AI-Generated Summary