TweedieMix:改进基于扩散的图像/视频生成的多概念融合
TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation
October 8, 2024
作者: Gihyun Kwon, Jong Chul Ye
cs.AI
摘要
尽管在定制文本到图像和视频生成模型方面取得了显著进展,但生成有效整合多个个性化概念的图像和视频仍然是一项具有挑战性的任务。为了解决这一问题,我们提出了 TweedieMix,一种在推断阶段组合定制扩散模型的新方法。通过分析逆扩散抽样的特性,我们的方法将抽样过程分为两个阶段。在初始步骤中,我们应用多对象感知抽样技术,以确保包含所需的目标对象。在后续步骤中,我们使用 Tweedie 的公式在去噪图像空间中混合自定义概念的外观。我们的结果表明,TweedieMix 可以比现有方法生成具有更高保真度的多个个性化概念。此外,我们的框架可以轻松扩展到图像到视频扩散模型,实现生成具有多个个性化概念的视频。结果和源代码位于我们的匿名项目页面上。
English
Despite significant advancements in customizing text-to-image and video
generation models, generating images and videos that effectively integrate
multiple personalized concepts remains a challenging task. To address this, we
present TweedieMix, a novel method for composing customized diffusion models
during the inference phase. By analyzing the properties of reverse diffusion
sampling, our approach divides the sampling process into two stages. During the
initial steps, we apply a multiple object-aware sampling technique to ensure
the inclusion of the desired target objects. In the later steps, we blend the
appearances of the custom concepts in the de-noised image space using Tweedie's
formula. Our results demonstrate that TweedieMix can generate multiple
personalized concepts with higher fidelity than existing methods. Moreover, our
framework can be effortlessly extended to image-to-video diffusion models,
enabling the generation of videos that feature multiple personalized concepts.
Results and source code are in our anonymous project page.Summary
AI-Generated Summary