协作式分数蒸馏用于一致的视觉合成
Collaborative Score Distillation for Consistent Visual Synthesis
July 4, 2023
作者: Subin Kim, Kyungmin Lee, June Suk Choi, Jongheon Jeong, Kihyuk Sohn, Jinwoo Shin
cs.AI
摘要
大规模文本到图像扩散模型的生成先验使得在不同视觉模态上进行广泛的新生成和编辑应用成为可能。然而,当将这些先验调整到复杂的视觉模态,通常表示为多个图像(例如视频)时,实现图像集合的一致性是具有挑战性的。在本文中,我们通过一种新颖的方法,协作评分蒸馏(CSD),来解决这一挑战。CSD基于Stein变分梯度下降(SVGD)。具体来说,我们建议将多个样本视为SVGD更新中的“粒子”,并结合它们的评分函数以同步地提炼图像集合上的生成先验。因此,CSD促进了跨2D图像整合信息的无缝集成,从而实现跨多个样本的一致视觉合成。我们展示了CSD在各种任务中的有效性,包括全景图像、视频和3D场景的视觉编辑。我们的结果强调了CSD作为一种多才多艺的方法,用于增强样本间一致性,从而拓宽了文本到图像扩散模型的适用范围。
English
Generative priors of large-scale text-to-image diffusion models enable a wide
range of new generation and editing applications on diverse visual modalities.
However, when adapting these priors to complex visual modalities, often
represented as multiple images (e.g., video), achieving consistency across a
set of images is challenging. In this paper, we address this challenge with a
novel method, Collaborative Score Distillation (CSD). CSD is based on the Stein
Variational Gradient Descent (SVGD). Specifically, we propose to consider
multiple samples as "particles" in the SVGD update and combine their score
functions to distill generative priors over a set of images synchronously.
Thus, CSD facilitates seamless integration of information across 2D images,
leading to a consistent visual synthesis across multiple samples. We show the
effectiveness of CSD in a variety of tasks, encompassing the visual editing of
panorama images, videos, and 3D scenes. Our results underline the competency of
CSD as a versatile method for enhancing inter-sample consistency, thereby
broadening the applicability of text-to-image diffusion models.