TweedieMix：改善基於擴散的影像/影片生成的多概念融合

摘要

儘管定制文本轉圖像和視頻生成模型取得了顯著進展，但生成有效整合多個個性化概念的圖像和視頻仍然是一項具有挑戰性的任務。為了應對這一挑戰，我們提出了 TweedieMix，一種在推斷階段組合定制擴散模型的新方法。通過分析逆向擴散取樣的特性，我們的方法將取樣過程分為兩個階段。在初始步驟中，我們應用多對象感知取樣技術，以確保包含所需的目標對象。在後續步驟中，我們使用 Tweedie 的公式在去噪圖像空間中混合自定義概念的外觀。我們的結果表明，TweedieMix 可以比現有方法生成具有更高保真度的多個個性化概念。此外，我們的框架可以輕鬆擴展到圖像到視頻擴散模型，從而實現生成具有多個個性化概念的視頻。結果和源代碼均在我們的匿名項目頁面上。

English

Despite significant advancements in customizing text-to-image and video generation models, generating images and videos that effectively integrate multiple personalized concepts remains a challenging task. To address this, we present TweedieMix, a novel method for composing customized diffusion models during the inference phase. By analyzing the properties of reverse diffusion sampling, our approach divides the sampling process into two stages. During the initial steps, we apply a multiple object-aware sampling technique to ensure the inclusion of the desired target objects. In the later steps, we blend the appearances of the custom concepts in the de-noised image space using Tweedie's formula. Our results demonstrate that TweedieMix can generate multiple personalized concepts with higher fidelity than existing methods. Moreover, our framework can be effortlessly extended to image-to-video diffusion models, enabling the generation of videos that feature multiple personalized concepts. Results and source code are in our anonymous project page.

TweedieMix：改善基於擴散的影像/影片生成的多概念融合

TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation

摘要

Support