Chain-of-Defensive-Thought: 構造化された推論が大規模言語モデルの参照データ破損に対する堅牢性を引き出す

要旨

Chain-of-Thoughtプロンプティングは、大規模言語モデルの推論能力を促進する上で大きな成功を収めてきました。本研究では、これらの強化された推論能力を活用して、必ずしも推論に焦点を当てていないタスクにおいて大規模言語モデルの頑健性を向上させる方法を探ります。特に、構造化された防御的推論を含む少数の例示を提供するだけで、chain-of-defensive-thoughtと呼ばれるシンプルな方法を用いて、幅広い大規模言語モデルが参照データの破損に対して大幅に改善された頑健性を示すことを実証します。経験的に、この方法の簡潔さと適用性を考えると、その改善は驚くべきものです。例えば、Natural Questionsタスクにおいて、標準的なプロンプティングでは、提供された10個の参照のうち1つがプロンプトインジェクション攻撃によって破損すると、GPT-4oの精度は60%からわずか3%まで低下します。一方、chain-of-defensive-thoughtプロンプティングを使用したGPT-4oは、50%の精度を維持します。

English

Chain-of-thought prompting has demonstrated great success in facilitating the reasoning abilities of large language models. In this work, we explore how these enhanced reasoning abilities can be exploited to improve the robustness of large language models in tasks that are not necessarily reasoning-focused. In particular, we show how a wide range of large language models exhibit significantly improved robustness against reference corruption using a simple method called chain-of-defensive-thought, where only a few exemplars with structured and defensive reasoning are provided as demonstrations. Empirically, the improvements can be astounding, especially given the simplicity and applicability of the method. For example, in the Natural Questions task, the accuracy of GPT-4o degrades from 60% to as low as 3% with standard prompting when 1 out of 10 references provided is corrupted with prompt injection attacks. In contrast, GPT-4o using chain-of-defensive-thought prompting maintains an accuracy of 50%.

Chain-of-Defensive-Thought: 構造化された推論が大規模言語モデルの参照データ破損に対する堅牢性を引き出す

Chain-of-Defensive-Thought: Structured Reasoning Elicits Robustness in Large Language Models against Reference Corruption

要旨

Support