协作式分数蒸馏用于一致的视觉合成

摘要

大规模文本到图像扩散模型的生成先验使得在不同视觉模态上进行广泛的新生成和编辑应用成为可能。然而，当将这些先验调整到复杂的视觉模态，通常表示为多个图像（例如视频）时，实现图像集合的一致性是具有挑战性的。在本文中，我们通过一种新颖的方法，协作评分蒸馏（CSD），来解决这一挑战。CSD基于Stein变分梯度下降（SVGD）。具体来说，我们建议将多个样本视为SVGD更新中的“粒子”，并结合它们的评分函数以同步地提炼图像集合上的生成先验。因此，CSD促进了跨2D图像整合信息的无缝集成，从而实现跨多个样本的一致视觉合成。我们展示了CSD在各种任务中的有效性，包括全景图像、视频和3D场景的视觉编辑。我们的结果强调了CSD作为一种多才多艺的方法，用于增强样本间一致性，从而拓宽了文本到图像扩散模型的适用范围。

English

Generative priors of large-scale text-to-image diffusion models enable a wide range of new generation and editing applications on diverse visual modalities. However, when adapting these priors to complex visual modalities, often represented as multiple images (e.g., video), achieving consistency across a set of images is challenging. In this paper, we address this challenge with a novel method, Collaborative Score Distillation (CSD). CSD is based on the Stein Variational Gradient Descent (SVGD). Specifically, we propose to consider multiple samples as "particles" in the SVGD update and combine their score functions to distill generative priors over a set of images synchronously. Thus, CSD facilitates seamless integration of information across 2D images, leading to a consistent visual synthesis across multiple samples. We show the effectiveness of CSD in a variety of tasks, encompassing the visual editing of panorama images, videos, and 3D scenes. Our results underline the competency of CSD as a versatile method for enhancing inter-sample consistency, thereby broadening the applicability of text-to-image diffusion models.

协作式分数蒸馏用于一致的视觉合成

Collaborative Score Distillation for Consistent Visual Synthesis

摘要

Support