オープンソースのLLMをファインチューニングする際は注意が必要です：あなたのファインチューニングデータが密かに盗まれる可能性があります！

要旨

オープンソースの大規模言語モデル（LLM）を独自データでファインチューニングすることは、下流開発者がタスク固有のLLMを取得するための標準的な手法となっています。しかし驚くべきことに、この手法に伴う新たで懸念すべきリスクを明らかにしました。オープンソースLLMの作成者が、後になって単純なバックドアトレーニングを通じて、下流のファインチューニングデータを抽出できるというリスクです。これには、ファインチューニングされた下流モデルへのブラックボックスアクセスさえあれば十分です。私たちの包括的な実験では、3Bから32Bのパラメータを持つ4つの人気オープンソースモデルと2つの下流データセットを使用し、抽出性能が驚くほど高いことが示されました。実用的な設定では、合計5,000サンプルのうち最大76.3%の下流ファインチューニングデータ（クエリ）が完全に抽出可能であり、より理想的な設定では成功率が94.9%にまで上昇します。また、検出ベースの防御戦略を探りましたが、改善された攻撃によって回避可能であることがわかりました。全体として、私たちはファインチューニングにおけるこの新たに特定されたデータ侵害リスクの緊急性を強調し、この懸念すべきリスクに対処するための進展を促すさらなる追跡研究が進むことを期待しています。実験で使用したコードとデータはhttps://github.com/thu-coai/Backdoor-Data-Extractionで公開しています。

English

Fine-tuning on open-source Large Language Models (LLMs) with proprietary data is now a standard practice for downstream developers to obtain task-specific LLMs. Surprisingly, we reveal a new and concerning risk along with the practice: the creator of the open-source LLMs can later extract the private downstream fine-tuning data through simple backdoor training, only requiring black-box access to the fine-tuned downstream model. Our comprehensive experiments, across 4 popularly used open-source models with 3B to 32B parameters and 2 downstream datasets, suggest that the extraction performance can be strikingly high: in practical settings, as much as 76.3% downstream fine-tuning data (queries) out of a total 5,000 samples can be perfectly extracted, and the success rate can increase to 94.9% in more ideal settings. We also explore a detection-based defense strategy but find it can be bypassed with improved attack. Overall, we highlight the emergency of this newly identified data breaching risk in fine-tuning, and we hope that more follow-up research could push the progress of addressing this concerning risk. The code and data used in our experiments are released at https://github.com/thu-coai/Backdoor-Data-Extraction.

オープンソースのLLMをファインチューニングする際は注意が必要です：あなたのファインチューニングデータが密かに盗まれる可能性があります！

Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!

要旨

Support