在語言模型API中審核提示快取

摘要

大型語言模型（LLMs）中的提示快取導致依賴資料的時間變異：已快取的提示比未快取的提示處理速度更快。這些時間差異引入了側信道時間攻擊的風險。例如，如果快取是跨使用者共享的，攻擊者可以通過快速 API 回應時間識別已快取的提示，從而獲取其他使用者提示的信息。由於提示快取可能導致隱私洩漏，API 提供者在快取政策方面的透明度至關重要。為此，我們開發並進行統計審計，以檢測現實世界中的LLM API提供者中的提示快取。我們檢測到七個API提供者中存在跨使用者的全局快取共享，包括OpenAI，導致可能有關使用者提示的隱私洩漏。由於提示快取導致的時間變異還可能導致有關模型架構的信息洩漏。具體而言，我們發現證據表明OpenAI的嵌入模型是僅解碼器的Transformer，這在先前並不為人所知。

English

Prompt caching in large language models (LLMs) results in data-dependent timing variations: cached prompts are processed faster than non-cached prompts. These timing differences introduce the risk of side-channel timing attacks. For example, if the cache is shared across users, an attacker could identify cached prompts from fast API response times to learn information about other users' prompts. Because prompt caching may cause privacy leakage, transparency around the caching policies of API providers is important. To this end, we develop and conduct statistical audits to detect prompt caching in real-world LLM API providers. We detect global cache sharing across users in seven API providers, including OpenAI, resulting in potential privacy leakage about users' prompts. Timing variations due to prompt caching can also result in leakage of information about model architecture. Namely, we find evidence that OpenAI's embedding model is a decoder-only Transformer, which was previously not publicly known.

在語言模型API中審核提示快取

Auditing Prompt Caching in Language Model APIs

摘要

Support