合成拡散モデルによるトレーニングデータ保護

要旨

我々は、Compartmentalized Diffusion Models（CDM）を提案する。これは、異なるデータソースに対して個別の拡散モデル（またはプロンプト）を訓練し、推論時にそれらを任意に組み合わせる手法である。個々のモデルは、独立して、異なるタイミングで、異なる分布やドメインで訓練することができ、後で組み合わせることで、全てのデータを同時に訓練した理想的なモデルに匹敵する性能を達成できる。さらに、各モデルは訓練中に曝露されたデータのサブセットに関する情報のみを含むため、いくつかの形式の訓練データ保護が可能となる。特に、CDMは大規模拡散モデルにおいて選択的忘却と継続学習の両方を可能にする初めての手法であり、ユーザーのアクセス権に基づいてカスタマイズされたモデルを提供することも可能にする。CDMはまた、特定のサンプルを生成する際のデータサブセットの重要性を決定することも可能にする。

English

We introduce Compartmentalized Diffusion Models (CDM), a method to train different diffusion models (or prompts) on distinct data sources and arbitrarily compose them at inference time. The individual models can be trained in isolation, at different times, and on different distributions and domains and can be later composed to achieve performance comparable to a paragon model trained on all data simultaneously. Furthermore, each model only contains information about the subset of the data it was exposed to during training, enabling several forms of training data protection. In particular, CDMs are the first method to enable both selective forgetting and continual learning for large-scale diffusion models, as well as allowing serving customized models based on the user's access rights. CDMs also allow determining the importance of a subset of the data in generating particular samples.

合成拡散モデルによるトレーニングデータ保護

Training Data Protection with Compositional Diffusion Models

要旨

Support