Trans4D:用于合成文本至4D 的现实几何感知过渡
Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis
October 9, 2024
作者: Bohan Zeng, Ling Yang, Siyu Li, Jiaming Liu, Zixiang Zhang, Juanxi Tian, Kaixin Zhu, Yongzhen Guo, Fu-Yun Wang, Minkai Xu, Stefano Ermon, Wentao Zhang
cs.AI
摘要
最近扩散模型的最新进展展示了在图像和视频生成方面的卓越能力,进一步提高了4D合成的效果。现有的4D生成方法可以基于用户友好的条件生成高质量的4D对象或场景,使得游戏和视频行业受益。然而,这些方法在合成复杂的4D过渡中的显著对象变形和场景内交互方面存在困难。为了解决这一挑战,我们提出了Trans4D,这是一个新颖的文本到4D合成框架,可以实现逼真的复杂场景过渡。具体地,我们首先使用多模态大语言模型(MLLMs)生成物理感知场景描述,用于4D场景初始化和有效的过渡时机规划。然后,我们提出了一个几何感知的4D过渡网络,根据计划实现基于表达性几何对象变形的复杂场景级4D过渡。大量实验证明,Trans4D在生成准确且高质量过渡的4D场景方面始终优于现有的最先进方法,验证了其有效性。源代码:https://github.com/YangLing0818/Trans4D
English
Recent advances in diffusion models have demonstrated exceptional
capabilities in image and video generation, further improving the effectiveness
of 4D synthesis. Existing 4D generation methods can generate high-quality 4D
objects or scenes based on user-friendly conditions, benefiting the gaming and
video industries. However, these methods struggle to synthesize significant
object deformation of complex 4D transitions and interactions within scenes. To
address this challenge, we propose Trans4D, a novel text-to-4D synthesis
framework that enables realistic complex scene transitions. Specifically, we
first use multi-modal large language models (MLLMs) to produce a physic-aware
scene description for 4D scene initialization and effective transition timing
planning. Then we propose a geometry-aware 4D transition network to realize a
complex scene-level 4D transition based on the plan, which involves expressive
geometrical object deformation. Extensive experiments demonstrate that Trans4D
consistently outperforms existing state-of-the-art methods in generating 4D
scenes with accurate and high-quality transitions, validating its
effectiveness. Code: https://github.com/YangLing0818/Trans4DSummary
AI-Generated Summary