PUMA：在五分鐘內安全推斷 LLaMA-7B

摘要

以ChatGPT為代表，許多公司已開始提供基於大型Transformer模型的服務。然而，使用此類服務不可避免地會洩漏使用者的提示給模型提供者。先前的研究已探討使用安全多方計算（MPC）來保護Transformer模型的安全推論，其中模型參數和客戶的提示被保密。儘管如此，這些框架在模型性能、效率和部署方面仍然存在限制。為了解決這些限制，我們提出了PUMA框架，以實現快速且安全的Transformer模型推論。我們的框架設計了昂貴功能的高質量近似，例如GeLU和Softmax，顯著降低了安全推論的成本，同時保持了模型性能。此外，我們設計了安全的嵌入和LayerNorm程序，忠實地實現所需功能，而不損害Transformer架構。PUMA比最先進的MPC框架MPCFORMER（ICLR 2023）快大約2倍，並且具有與未進行微調的明文模型相似的準確性（先前的工作未能實現）。另外，PUMA可以在約5分鐘內評估LLaMA-7B以生成1個標記。據我們所知，這是首次能夠在MPC下評估具有此參數大小的模型。PUMA已在SecretFlow-SPU的Github存儲庫中開源。

English

With ChatGPT as a representative, tons of companies have began to provide services based on large Transformers models. However, using such a service inevitably leak users' prompts to the model provider. Previous studies have studied secure inference for Transformer models using secure multiparty computation (MPC), where model parameters and clients' prompts are kept secret. Despite this, these frameworks are still limited in terms of model performance, efficiency, and deployment. To address these limitations, we propose framework PUMA to enable fast and secure Transformer model inference. Our framework designs high quality approximations for expensive functions, such as GeLU and Softmax, which significantly reduce the cost of secure inference while preserving the model performance. Additionally, we design secure Embedding and LayerNorm procedures that faithfully implement the desired functionality without undermining the Transformer architecture. PUMA is about 2x faster than the state-of-the-art MPC framework MPCFORMER(ICLR 2023) and has similar accuracy as plaintext models without fine-tuning (which the previous works failed to achieve). One more thing, PUMA can evaluate LLaMA-7B in around 5 minutes to generate 1 token. To our best knowledge, this is the first time that a model with such a parameter size is able to be evaluated under MPC. PUMA has been open-sourced in the Github repository of SecretFlow-SPU.

PUMA：在五分鐘內安全推斷 LLaMA-7B

PUMA: Secure Inference of LLaMA-7B in Five Minutes

摘要

Support