链式防御思维:结构化推理增强大语言模型在参考信息污染下的鲁棒性
Chain-of-Defensive-Thought: Structured Reasoning Elicits Robustness in Large Language Models against Reference Corruption
April 29, 2025
作者: Wenxiao Wang, Parsa Hosseini, Soheil Feizi
cs.AI
摘要
链式思维提示法在提升大型语言模型的推理能力方面已展现出显著成效。本研究探讨了如何利用这些增强的推理能力,来提升大型语言模型在非纯粹推理任务中的鲁棒性。具体而言,我们展示了一种名为“防御性思维链”的简单方法,通过提供少量包含结构化防御性推理的示例,使得多种大型语言模型在面对参考信息被污染时表现出显著增强的鲁棒性。实验结果表明,该方法带来的改进令人瞩目,尤其是考虑到其简洁性与广泛适用性。例如,在自然问答任务中,当提供的10个参考信息中有1个受到提示注入攻击污染时,采用标准提示法的GPT-4o准确率从60%骤降至3%。相比之下,采用防御性思维链提示法的GPT-4o则保持了50%的准确率。
English
Chain-of-thought prompting has demonstrated great success in facilitating the
reasoning abilities of large language models. In this work, we explore how
these enhanced reasoning abilities can be exploited to improve the robustness
of large language models in tasks that are not necessarily reasoning-focused.
In particular, we show how a wide range of large language models exhibit
significantly improved robustness against reference corruption using a simple
method called chain-of-defensive-thought, where only a few exemplars with
structured and defensive reasoning are provided as demonstrations. Empirically,
the improvements can be astounding, especially given the simplicity and
applicability of the method. For example, in the Natural Questions task, the
accuracy of GPT-4o degrades from 60% to as low as 3% with standard prompting
when 1 out of 10 references provided is corrupted with prompt injection
attacks. In contrast, GPT-4o using chain-of-defensive-thought prompting
maintains an accuracy of 50%.