Trans4D：針對構成性文本轉4D綜合的寫實幾何感知過渡

摘要

最近擴散模型的最新進展展示了在圖像和視頻生成方面的卓越能力，進一步提高了4D合成的效能。現有的4D生成方法可以根據用戶友好的條件生成高質量的4D物體或場景，使得遊戲和視頻行業受益。然而，這些方法在合成複雜的4D過渡中的重要物體變形和場景內互動方面仍然存在困難。為了應對這一挑戰，我們提出了Trans4D，一種新穎的文本到4D合成框架，實現了逼真的複雜場景過渡。具體來說，我們首先使用多模態大型語言模型（MLLMs）生成物理感知場景描述，用於4D場景初始化和有效過渡時序規劃。然後，我們提出了一種幾何感知的4D過渡網絡，根據計劃實現複雜的場景級4D過渡，其中包括具有表現力的幾何物體變形。大量實驗表明，Trans4D在生成具有準確和高質量過渡的4D場景方面始終優於現有的最先進方法，驗證了其有效性。代碼：https://github.com/YangLing0818/Trans4D

English

Recent advances in diffusion models have demonstrated exceptional capabilities in image and video generation, further improving the effectiveness of 4D synthesis. Existing 4D generation methods can generate high-quality 4D objects or scenes based on user-friendly conditions, benefiting the gaming and video industries. However, these methods struggle to synthesize significant object deformation of complex 4D transitions and interactions within scenes. To address this challenge, we propose Trans4D, a novel text-to-4D synthesis framework that enables realistic complex scene transitions. Specifically, we first use multi-modal large language models (MLLMs) to produce a physic-aware scene description for 4D scene initialization and effective transition timing planning. Then we propose a geometry-aware 4D transition network to realize a complex scene-level 4D transition based on the plan, which involves expressive geometrical object deformation. Extensive experiments demonstrate that Trans4D consistently outperforms existing state-of-the-art methods in generating 4D scenes with accurate and high-quality transitions, validating its effectiveness. Code: https://github.com/YangLing0818/Trans4D

Trans4D：針對構成性文本轉4D綜合的寫實幾何感知過渡

Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis

摘要

Support