謎を解け！検索拡張生成に対するステルスなメンバーシップ推論

要旨

Retrieval-Augmented Generation（RAG）は、外部の知識データベースを活用して、モデルパラメータを変更せずに、大規模言語モデル（LLM）が根拠のある応答を生成することを可能にします。重み調整の欠如により、モデルパラメータを介した情報漏洩を防ぐ一方で、リトリーブされた文書がモデルの文脈で悪用されるリスクが生じます。既存のメンバーシップ推論およびデータ抽出手法は、しばしばジェイルブレイキングや注意深く作成された非自然なクエリに依存しており、これらはRAGシステムで一般的なクエリ書き換え技術によって簡単に検出または阻止されます。本研究では、RAGデータストア内の文書を対象とするメンバーシップ推論手法であるInterrogation Attack（IA）を提案します。対象の文書の存在のみで回答可能な自然文のクエリを作成することにより、我々の手法は、30個のクエリだけで成功した推論を示し、かつ潜在的であり続けます。既存手法からの敵対的なプロンプトを、我々の攻撃によって生成されるものよりも約76倍多く検出する直感的な検出器が存在します。我々は、さまざまなRAG構成にわたる以前の推論攻撃に比べて、TPR@1%FPRで2倍の改善を観察し、かつ文書推論ごとに0.02ドル未満のコストで実現しています。

English

Retrieval-Augmented Generation (RAG) enables Large Language Models (LLMs) to generate grounded responses by leveraging external knowledge databases without altering model parameters. Although the absence of weight tuning prevents leakage via model parameters, it introduces the risk of inference adversaries exploiting retrieved documents in the model's context. Existing methods for membership inference and data extraction often rely on jailbreaking or carefully crafted unnatural queries, which can be easily detected or thwarted with query rewriting techniques common in RAG systems. In this work, we present Interrogation Attack (IA), a membership inference technique targeting documents in the RAG datastore. By crafting natural-text queries that are answerable only with the target document's presence, our approach demonstrates successful inference with just 30 queries while remaining stealthy; straightforward detectors identify adversarial prompts from existing methods up to ~76x more frequently than those generated by our attack. We observe a 2x improvement in TPR@1%FPR over prior inference attacks across diverse RAG configurations, all while costing less than $0.02 per document inference.

謎を解け！検索拡張生成に対するステルスなメンバーシップ推論

Riddle Me This! Stealthy Membership Inference for Retrieval-Augmented Generation

要旨

Support