エキスパートの混合からユーザープロンプトを盗む

要旨

Mixture-of-Experts（MoE）モデルは、各トークンを各層の少数の専門家にルーティングすることにより、密な言語モデルの効率性とスケーラビリティを向上させます。本論文では、被害者のクエリが攻撃者のクエリと同じバッチの例に配置されるようにできる敵対者が、Expert-Choice-Routingを悪用して被害者のプロンプトを完全に開示できる方法を示します。私たちは、torch.topk CUDAの実装のタイ処理の挙動を悪用し、2層のMixtralモデルに対してこの攻撃の効果を実証しています。私たちの結果は、（語彙サイズVとプロンプトの長さMを考慮した場合の）O（{VM}^2）のクエリを使用して完全なプロンプトを抽出できること、または平均してトークンあたり100のクエリを使用できることを示しています。これは、ユーザープロンプトを抽出する目的でアーキテクチャ上の欠陥を悪用する最初の攻撃であり、新しいクラスのLLMの脆弱性を導入しています。

English

Mixture-of-Experts (MoE) models improve the efficiency and scalability of dense language models by routing each token to a small number of experts in each layer. In this paper, we show how an adversary that can arrange for their queries to appear in the same batch of examples as a victim's queries can exploit Expert-Choice-Routing to fully disclose a victim's prompt. We successfully demonstrate the effectiveness of this attack on a two-layer Mixtral model, exploiting the tie-handling behavior of the torch.topk CUDA implementation. Our results show that we can extract the entire prompt using O({VM}^2) queries (with vocabulary size V and prompt length M) or 100 queries on average per token in the setting we consider. This is the first attack to exploit architectural flaws for the purpose of extracting user prompts, introducing a new class of LLM vulnerabilities.

エキスパートの混合からユーザープロンプトを盗む

Stealing User Prompts from Mixture of Experts

要旨

Support