Trans4D:針對構成性文本轉4D綜合的寫實幾何感知過渡
Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis
October 9, 2024
作者: Bohan Zeng, Ling Yang, Siyu Li, Jiaming Liu, Zixiang Zhang, Juanxi Tian, Kaixin Zhu, Yongzhen Guo, Fu-Yun Wang, Minkai Xu, Stefano Ermon, Wentao Zhang
cs.AI
摘要
最近擴散模型的最新進展展示了在圖像和視頻生成方面的卓越能力,進一步提高了4D合成的效能。現有的4D生成方法可以根據用戶友好的條件生成高質量的4D物體或場景,使得遊戲和視頻行業受益。然而,這些方法在合成複雜的4D過渡中的重要物體變形和場景內互動方面仍然存在困難。為了應對這一挑戰,我們提出了Trans4D,一種新穎的文本到4D合成框架,實現了逼真的複雜場景過渡。具體來說,我們首先使用多模態大型語言模型(MLLMs)生成物理感知場景描述,用於4D場景初始化和有效過渡時序規劃。然後,我們提出了一種幾何感知的4D過渡網絡,根據計劃實現複雜的場景級4D過渡,其中包括具有表現力的幾何物體變形。大量實驗表明,Trans4D在生成具有準確和高質量過渡的4D場景方面始終優於現有的最先進方法,驗證了其有效性。代碼:https://github.com/YangLing0818/Trans4D
English
Recent advances in diffusion models have demonstrated exceptional
capabilities in image and video generation, further improving the effectiveness
of 4D synthesis. Existing 4D generation methods can generate high-quality 4D
objects or scenes based on user-friendly conditions, benefiting the gaming and
video industries. However, these methods struggle to synthesize significant
object deformation of complex 4D transitions and interactions within scenes. To
address this challenge, we propose Trans4D, a novel text-to-4D synthesis
framework that enables realistic complex scene transitions. Specifically, we
first use multi-modal large language models (MLLMs) to produce a physic-aware
scene description for 4D scene initialization and effective transition timing
planning. Then we propose a geometry-aware 4D transition network to realize a
complex scene-level 4D transition based on the plan, which involves expressive
geometrical object deformation. Extensive experiments demonstrate that Trans4D
consistently outperforms existing state-of-the-art methods in generating 4D
scenes with accurate and high-quality transitions, validating its
effectiveness. Code: https://github.com/YangLing0818/Trans4DSummary
AI-Generated Summary