防禦性思維鏈:結構化推理提升大型語言模型對參考資料損壞的魯棒性
Chain-of-Defensive-Thought: Structured Reasoning Elicits Robustness in Large Language Models against Reference Corruption
April 29, 2025
作者: Wenxiao Wang, Parsa Hosseini, Soheil Feizi
cs.AI
摘要
思維鏈提示法在提升大型語言模型的推理能力方面展現了顯著成效。本研究探討如何利用這些增強後的推理能力,來提高大型語言模型在非純推理任務中的穩健性。具體而言,我們展示了一種名為防禦性思維鏈的簡單方法,僅需提供少量具有結構化防禦性推理的範例作為示範,就能使多種大型語言模型在面對參考資料損壞時表現出顯著提升的穩健性。從實證結果來看,這種方法帶來的改進令人驚嘆,尤其考慮到其簡易性和廣泛適用性。例如,在自然問答任務中,當提供的10個參考資料中有1個受到提示注入攻擊而損壞時,使用標準提示法的GPT-4o準確率從60%驟降至3%。相比之下,採用防禦性思維鏈提示法的GPT-4o則能保持50%的準確率。
English
Chain-of-thought prompting has demonstrated great success in facilitating the
reasoning abilities of large language models. In this work, we explore how
these enhanced reasoning abilities can be exploited to improve the robustness
of large language models in tasks that are not necessarily reasoning-focused.
In particular, we show how a wide range of large language models exhibit
significantly improved robustness against reference corruption using a simple
method called chain-of-defensive-thought, where only a few exemplars with
structured and defensive reasoning are provided as demonstrations. Empirically,
the improvements can be astounding, especially given the simplicity and
applicability of the method. For example, in the Natural Questions task, the
accuracy of GPT-4o degrades from 60% to as low as 3% with standard prompting
when 1 out of 10 references provided is corrupted with prompt injection
attacks. In contrast, GPT-4o using chain-of-defensive-thought prompting
maintains an accuracy of 50%.