使用组合扩散模型进行训练数据保护

摘要

我们介绍了分区扩散模型（CDM），这是一种在不同数据源上训练不同扩散模型（或提示）并在推断时任意组合它们的方法。各个模型可以在隔离环境中训练，不同时间、不同分布和领域上进行训练，然后可以组合以达到与同时在所有数据上训练的模型相媲美的性能。此外，每个模型仅包含在训练期间接触到的数据子集的信息，从而实现多种形式的训练数据保护。特别地，CDM 是第一种能够为大规模扩散模型实现选择性遗忘和持续学习的方法，同时也允许根据用户的访问权限提供定制模型。CDM 还允许确定生成特定样本时数据子集的重要性。

English

We introduce Compartmentalized Diffusion Models (CDM), a method to train different diffusion models (or prompts) on distinct data sources and arbitrarily compose them at inference time. The individual models can be trained in isolation, at different times, and on different distributions and domains and can be later composed to achieve performance comparable to a paragon model trained on all data simultaneously. Furthermore, each model only contains information about the subset of the data it was exposed to during training, enabling several forms of training data protection. In particular, CDMs are the first method to enable both selective forgetting and continual learning for large-scale diffusion models, as well as allowing serving customized models based on the user's access rights. CDMs also allow determining the importance of a subset of the data in generating particular samples.

使用组合扩散模型进行训练数据保护

Training Data Protection with Compositional Diffusion Models

摘要

Support