DiffSplat：將圖像擴散模型重新用於可擴展的高斯擴散生成

摘要

最近在從文字或單張圖像生成3D內容方面取得了一些進展，但仍受限於高質量的3D數據集和2D多視角生成的不一致性。我們引入了DiffSplat，一種新穎的3D生成框架，通過馴服大規模文本到圖像擴散模型，本地生成3D高斯斑點。它與以往的3D生成模型不同之處在於有效地利用網絡規模的2D先驗，同時在統一模型中保持3D一致性。為了啟動訓練，提出了一個輕量級重建模型，可立即生成多視角高斯斑點網格，用於可擴展數據集的編輯。除了這些網格上的常規擴散損失外，還引入了一種3D渲染損失，以促進在任意視角下的3D一致性。與圖像擴散模型的兼容性使得能夠無縫地將眾多圖像生成技術適應到3D領域。大量實驗顯示DiffSplat在文本和圖像條件下的生成任務和下游應用中的優越性。徹底的消融研究驗證了每個關鍵設計選擇的有效性，並提供了對底層機制的見解。

English

Recent advancements in 3D content generation from text or a single image struggle with limited high-quality 3D datasets and inconsistency from 2D multi-view generation. We introduce DiffSplat, a novel 3D generative framework that natively generates 3D Gaussian splats by taming large-scale text-to-image diffusion models. It differs from previous 3D generative models by effectively utilizing web-scale 2D priors while maintaining 3D consistency in a unified model. To bootstrap the training, a lightweight reconstruction model is proposed to instantly produce multi-view Gaussian splat grids for scalable dataset curation. In conjunction with the regular diffusion loss on these grids, a 3D rendering loss is introduced to facilitate 3D coherence across arbitrary views. The compatibility with image diffusion models enables seamless adaptions of numerous techniques for image generation to the 3D realm. Extensive experiments reveal the superiority of DiffSplat in text- and image-conditioned generation tasks and downstream applications. Thorough ablation studies validate the efficacy of each critical design choice and provide insights into the underlying mechanism.

DiffSplat：將圖像擴散模型重新用於可擴展的高斯擴散生成

DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation

摘要

Support