通过强化学习实现大型语言模型的高效差分隐私微调
Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning
July 30, 2025
作者: Afshin Khadangi, Amir Sartipi, Igor Tchappi, Ramin Bahmani, Gilbert Fridgen
cs.AI
摘要
数据隐私与模型效用之间的张力已成为训练于敏感语料(包括医疗领域)的大型语言模型(LLMs)实际部署中的核心瓶颈。差分隐私随机梯度下降(DP-SGD)虽能确保形式上的隐私保护,却以显著代价实现:梯度被强制裁剪并加入噪声,导致样本效率与最终准确率下降。众多改进方案试图缓解这一权衡,但均面临同一局限:其控制参数固定、全局且无视优化过程的动态变化。因此,实践者不得不在追求效用时过度消耗隐私预算,或为遵守隐私约束而接受平庸模型。我们提出RLDP,首次将DP优化本身构建为一个闭环控制问题,适用于现代深度强化学习(RL)。RLDP持续感知学习动态的丰富统计信息,并通过选择细粒度的逐参数梯度裁剪阈值及注入高斯噪声的幅度来采取行动。在语言模型微调过程中,一个软演员-评论家(SAC)超策略在线训练,从零开始学习如何在关键之处及时分配隐私预算。在GPT2-small、Llama-1B、Llama-3B及Mistral-7B上进行的超过1600项消融实验中,RLDP实现了1.3%-30.5%(平均5.4%)的困惑度降低及平均5.6%的下游效用提升。RLDP仅需13%-43%的梯度更新预算(平均加速71%)即可达到各基准的最终效用,同时严格遵守相同的(ε, δ)-DP协议,并在成员推断与金丝雀提取攻击方面展现出同等或更低的易感性。
English
The tension between data privacy and model utility has become the defining
bottleneck for the practical deployment of large language models (LLMs) trained
on sensitive corpora including healthcare. Differentially private stochastic
gradient descent (DP-SGD) guarantees formal privacy, yet it does so at a
pronounced cost: gradients are forcibly clipped and perturbed with noise,
degrading sample efficiency and final accuracy. Numerous variants have been
proposed to soften this trade-off, but they all share a handicap: their control
knobs are hard-coded, global, and oblivious to the evolving optimization
landscape. Consequently, practitioners are forced either to over-spend privacy
budget in pursuit of utility, or to accept mediocre models in order to stay
within privacy constraints. We present RLDP, the first framework to cast DP
optimization itself as a closed-loop control problem amenable to modern deep
reinforcement learning (RL). RLDP continuously senses rich statistics of the
learning dynamics and acts by selecting fine-grained per parameter
gradient-clipping thresholds as well as the magnitude of injected Gaussian
noise. A soft actor-critic (SAC) hyper-policy is trained online during language
model fine-tuning; it learns, from scratch, how to allocate the privacy budget
where it matters and when it matters. Across more than 1,600 ablation
experiments on GPT2-small, Llama-1B, Llama-3B, and Mistral-7B, RLDP delivers
perplexity reductions of 1.3-30.5% (mean 5.4%) and an average 5.6% downstream
utility gain. RLDP reaches each baseline's final utility after only 13-43% of
the gradient-update budget (mean speed-up 71%), all while honoring the same
(epsilon, delta)-DP contract and exhibiting equal or lower susceptibility
to membership-inference and canary-extraction attacks.