言語モデルAPIにおけるプロンプトキャッシュの監査

要旨

大規模言語モデル（LLMs）におけるプロンプトのキャッシュは、データ依存のタイミングの変動を引き起こします。キャッシュされたプロンプトは、キャッシュされていないプロンプトよりも処理が速くなります。これらのタイミングの違いは、サイドチャネルのタイミング攻撃のリスクをもたらします。たとえば、キャッシュがユーザー間で共有されている場合、攻撃者は高速なAPI応答時間からキャッシュされたプロンプトを特定し、他のユーザーのプロンプトに関する情報を学ぶことができます。プロンプトのキャッシュによってプライバシーの漏洩が引き起こされる可能性があるため、APIプロバイダーのキャッシングポリシーに関する透明性は重要です。このため、我々は実世界のLLM APIプロバイダーにおけるプロンプトのキャッシングを検出するための統計的監査を開発・実施します。OpenAIを含む7つのAPIプロバイダーでユーザー間でのグローバルなキャッシュ共有を検出し、ユーザーのプロンプトに関する潜在的なプライバシー漏洩が発生しています。プロンプトのキャッシングによるタイミングの変動は、モデルアーキテクチャに関する情報の漏洩も引き起こす可能性があります。具体的には、OpenAIの埋め込みモデルがデコーダーのみのTransformerである証拠を発見しましたが、これは以前一般に知られていませんでした。

English

Prompt caching in large language models (LLMs) results in data-dependent timing variations: cached prompts are processed faster than non-cached prompts. These timing differences introduce the risk of side-channel timing attacks. For example, if the cache is shared across users, an attacker could identify cached prompts from fast API response times to learn information about other users' prompts. Because prompt caching may cause privacy leakage, transparency around the caching policies of API providers is important. To this end, we develop and conduct statistical audits to detect prompt caching in real-world LLM API providers. We detect global cache sharing across users in seven API providers, including OpenAI, resulting in potential privacy leakage about users' prompts. Timing variations due to prompt caching can also result in leakage of information about model architecture. Namely, we find evidence that OpenAI's embedding model is a decoder-only Transformer, which was previously not publicly known.

言語モデルAPIにおけるプロンプトキャッシュの監査

Auditing Prompt Caching in Language Model APIs

要旨

Support