Locket:面向语言模型的鲁棒特征锁定技术
Locket: Robust Feature-Locking Technique for Language Models
October 14, 2025
作者: Lipeng He, Vasisht Duddu, N. Asokan
cs.AI
摘要
聊天機器人提供商(如OpenAI)依賴分層訂閱方案來產生收入,為免費用戶提供基礎模型,而為付費用戶提供高級模型。然而,針對高級功能(如數學、編程)的更細粒度的付費解鎖方案被認為對提供商更具經濟可行性。此類方案需要一種功能鎖定技術(FLoTE),該技術需滿足以下條件:(i) 能有效拒絕鎖定功能,(ii) 對已解鎖功能保持效用,(iii) 能抵禦規避或未經授權的憑證共享,(iv) 能擴展至多種功能與用戶。然而,現有的FLoTEs(如密碼鎖定模型)既不具備魯棒性也不具備可擴展性。我們提出了Locket,這是首個實現付費解鎖方案的魯棒且可擴展的FLoTE。Locket採用了一種新穎的融合方法,將適配器附加到大型語言模型(LLM)上,以拒絕未經授權的功能。我們全面的評估顯示,Locket在有效性(對鎖定功能的拒絕率達100%)、效用保持(已解鎖功能的效用下降≤7%)、魯棒性(攻擊成功率≤5%)以及對多種功能和客戶的可擴展性方面均表現出色。
English
Chatbot providers (e.g., OpenAI) rely on tiered subscription schemes to
generate revenue, offering basic models for free users, and advanced models for
paying subscribers. However, a finer-grained pay-to-unlock scheme for premium
features (e.g., math, coding) is thought to be more economically viable for the
providers. Such a scheme requires a feature-locking technique (FLoTE) which is
(i) effective in refusing locked features, (ii) utility-preserving for unlocked
features, (iii) robust against evasion or unauthorized credential sharing, and
(iv) scalable to multiple features and users. However, existing FLoTEs (e.g.,
password-locked models) are not robust or scalable. We present Locket, the
first robust and scalable FLoTE to enable pay-to-unlock schemes. Locket uses a
novel merging approach to attach adapters to an LLM for refusing unauthorized
features. Our comprehensive evaluation shows that Locket is effective (100%
refusal on locked features), utility-preserving (leq 7% utility degradation
in unlocked features), robust (leq 5% attack success rate), and scales to
multiple features and clients.