ChatPaper.aiChatPaper

当对开源大语言模型进行微调时需谨慎:您的微调数据可能被悄无声息地窃取!

Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!

May 21, 2025
作者: Zhexin Zhang, Yuhao Sun, Junxiao Yang, Shiyao Cui, Hongning Wang, Minlie Huang
cs.AI

摘要

利用专有数据对开源大语言模型(LLMs)进行微调,已成为下游开发者获取任务特定LLMs的标准做法。然而,令人惊讶的是,我们揭示了一种伴随此实践而来的新风险:开源LLMs的创建者随后可通过简单的后门训练,仅需对微调后的下游模型进行黑盒访问,便能提取私有的下游微调数据。我们在一系列广泛实验中,针对4个广泛使用的开源模型(参数规模从3B到32B不等)和2个下游数据集进行了测试,结果表明,数据提取效果惊人地高:在实际场景下,从总计5000个样本中,高达76.3%的下游微调数据(查询)可被完美提取,而在更理想条件下,成功率可提升至94.9%。我们还探索了一种基于检测的防御策略,但发现其可被改进后的攻击所绕过。总体而言,我们强调了这一新发现的微调数据泄露风险的紧迫性,并期待更多后续研究能推动解决这一令人担忧的风险。实验所用代码与数据已发布于https://github.com/thu-coai/Backdoor-Data-Extraction。
English
Fine-tuning on open-source Large Language Models (LLMs) with proprietary data is now a standard practice for downstream developers to obtain task-specific LLMs. Surprisingly, we reveal a new and concerning risk along with the practice: the creator of the open-source LLMs can later extract the private downstream fine-tuning data through simple backdoor training, only requiring black-box access to the fine-tuned downstream model. Our comprehensive experiments, across 4 popularly used open-source models with 3B to 32B parameters and 2 downstream datasets, suggest that the extraction performance can be strikingly high: in practical settings, as much as 76.3% downstream fine-tuning data (queries) out of a total 5,000 samples can be perfectly extracted, and the success rate can increase to 94.9% in more ideal settings. We also explore a detection-based defense strategy but find it can be bypassed with improved attack. Overall, we highlight the emergency of this newly identified data breaching risk in fine-tuning, and we hope that more follow-up research could push the progress of addressing this concerning risk. The code and data used in our experiments are released at https://github.com/thu-coai/Backdoor-Data-Extraction.

Summary

AI-Generated Summary

PDF112May 22, 2025