對齊您的高斯函數:使用動態3D高斯函數和組合擴散模型的文本到4D。
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models
December 21, 2023
作者: Huan Ling, Seung Wook Kim, Antonio Torralba, Sanja Fidler, Karsten Kreis
cs.AI
摘要
基於文字引導的擴散模型已經在圖像和視頻生成方面引起了革命,並且已成功應用於基於優化的3D物體合成。在這裡,我們專注於尚未深入研究的文本到4D設置,並使用得分蒸餾方法在額外的時間維度上合成動態的、動畫的3D物體。與先前的工作相比,我們採用了一種新穎的基於組合生成的方法,結合了文本到圖像、文本到視頻和3D感知多視圖擴散模型,以在4D物體優化過程中提供反饋,從而同時強制實現時間一致性、高質量的視覺外觀和逼真的幾何形狀。我們的方法名為"調整您的高斯函數"(AYG),利用動態3D高斯擴散與變形場作為4D表示。AYG的關鍵在於一種新穎的方法,用於正則化移動的3D高斯函數的分佈,從而穩定優化過程並誘導運動。我們還提出了一種運動放大機制,以及一種新的自回歸合成方案,用於生成和組合多個4D序列以進行更長時間的生成。這些技術使我們能夠合成生動的動態場景,從質量和量化上優於先前的工作,並實現了最先進的文本到4D性能。由於高斯4D表示,不同的4D動畫可以無縫組合,正如我們所展示的。AYG為動畫、模擬和數字內容創作以及合成數據生成開辟了有前途的途徑。
English
Text-guided diffusion models have revolutionized image and video generation
and have also been successfully used for optimization-based 3D object
synthesis. Here, we instead focus on the underexplored text-to-4D setting and
synthesize dynamic, animated 3D objects using score distillation methods with
an additional temporal dimension. Compared to previous work, we pursue a novel
compositional generation-based approach, and combine text-to-image,
text-to-video, and 3D-aware multiview diffusion models to provide feedback
during 4D object optimization, thereby simultaneously enforcing temporal
consistency, high-quality visual appearance and realistic geometry. Our method,
called Align Your Gaussians (AYG), leverages dynamic 3D Gaussian Splatting with
deformation fields as 4D representation. Crucial to AYG is a novel method to
regularize the distribution of the moving 3D Gaussians and thereby stabilize
the optimization and induce motion. We also propose a motion amplification
mechanism as well as a new autoregressive synthesis scheme to generate and
combine multiple 4D sequences for longer generation. These techniques allow us
to synthesize vivid dynamic scenes, outperform previous work qualitatively and
quantitatively and achieve state-of-the-art text-to-4D performance. Due to the
Gaussian 4D representation, different 4D animations can be seamlessly combined,
as we demonstrate. AYG opens up promising avenues for animation, simulation and
digital content creation as well as synthetic data generation.