Locket：面向语言模型的鲁棒特征锁定技术

摘要

聊天機器人提供商（如OpenAI）依賴分層訂閱方案來產生收入，為免費用戶提供基礎模型，而為付費用戶提供高級模型。然而，針對高級功能（如數學、編程）的更細粒度的付費解鎖方案被認為對提供商更具經濟可行性。此類方案需要一種功能鎖定技術（FLoTE），該技術需滿足以下條件：(i) 能有效拒絕鎖定功能，(ii) 對已解鎖功能保持效用，(iii) 能抵禦規避或未經授權的憑證共享，(iv) 能擴展至多種功能與用戶。然而，現有的FLoTEs（如密碼鎖定模型）既不具備魯棒性也不具備可擴展性。我們提出了Locket，這是首個實現付費解鎖方案的魯棒且可擴展的FLoTE。Locket採用了一種新穎的融合方法，將適配器附加到大型語言模型（LLM）上，以拒絕未經授權的功能。我們全面的評估顯示，Locket在有效性（對鎖定功能的拒絕率達100%）、效用保持（已解鎖功能的效用下降≤7%）、魯棒性（攻擊成功率≤5%）以及對多種功能和客戶的可擴展性方面均表現出色。

English

Chatbot providers (e.g., OpenAI) rely on tiered subscription schemes to generate revenue, offering basic models for free users, and advanced models for paying subscribers. However, a finer-grained pay-to-unlock scheme for premium features (e.g., math, coding) is thought to be more economically viable for the providers. Such a scheme requires a feature-locking technique (FLoTE) which is (i) effective in refusing locked features, (ii) utility-preserving for unlocked features, (iii) robust against evasion or unauthorized credential sharing, and (iv) scalable to multiple features and users. However, existing FLoTEs (e.g., password-locked models) are not robust or scalable. We present Locket, the first robust and scalable FLoTE to enable pay-to-unlock schemes. Locket uses a novel merging approach to attach adapters to an LLM for refusing unauthorized features. Our comprehensive evaluation shows that Locket is effective (100% refusal on locked features), utility-preserving (leq 7% utility degradation in unlocked features), robust (leq 5% attack success rate), and scales to multiple features and clients.

Locket：面向语言模型的鲁棒特征锁定技术

Locket: Robust Feature-Locking Technique for Language Models

摘要

Support