구성적 확산 모델을 활용한 훈련 데이터 보호

초록

우리는 구획화된 확산 모델(Compartmentalized Diffusion Models, CDM)을 소개합니다. 이 방법은 서로 다른 데이터 소스에 대해 개별적인 확산 모델(또는 프롬프트)을 훈련시키고, 추론 시점에 이들을 자유롭게 조합할 수 있게 합니다. 각각의 모델은 독립적으로, 서로 다른 시점에, 그리고 다양한 분포와 도메인에서 훈련될 수 있으며, 나중에 이들을 조합하여 모든 데이터를 동시에 훈련한 기준 모델과 비슷한 성능을 달성할 수 있습니다. 더욱이, 각 모델은 훈련 중에 노출된 데이터의 부분집합에 대한 정보만을 포함하므로, 여러 형태의 훈련 데이터 보호가 가능합니다. 특히, CDM은 대규모 확산 모델에 대해 선택적 망각(selective forgetting)과 지속 학습(continual learning)을 동시에 가능하게 하는 최초의 방법이며, 사용자의 접근 권한에 따라 맞춤형 모델을 제공할 수 있게 합니다. 또한 CDM은 특정 샘플을 생성하는 데 있어 데이터의 부분집합의 중요성을 결정할 수 있게 합니다.

English

We introduce Compartmentalized Diffusion Models (CDM), a method to train different diffusion models (or prompts) on distinct data sources and arbitrarily compose them at inference time. The individual models can be trained in isolation, at different times, and on different distributions and domains and can be later composed to achieve performance comparable to a paragon model trained on all data simultaneously. Furthermore, each model only contains information about the subset of the data it was exposed to during training, enabling several forms of training data protection. In particular, CDMs are the first method to enable both selective forgetting and continual learning for large-scale diffusion models, as well as allowing serving customized models based on the user's access rights. CDMs also allow determining the importance of a subset of the data in generating particular samples.

구성적 확산 모델을 활용한 훈련 데이터 보호

Training Data Protection with Compositional Diffusion Models

초록

Support