TC4D:轨迹条件文本生成至4D
TC4D: Trajectory-Conditioned Text-to-4D Generation
March 26, 2024
作者: Sherwin Bahmani, Xian Liu, Yifan Wang, Ivan Skorokhodov, Victor Rong, Ziwei Liu, Xihui Liu, Jeong Joon Park, Sergey Tulyakov, Gordon Wetzstein, Andrea Tagliasacchi, David B. Lindell
cs.AI
摘要
最近的文本到4D生成技术利用预训练的文本到视频模型进行监督,合成动态的3D场景。然而,现有的运动表示,如变形模型或时域神经表示,受到生成运动范围限制,无法合成超出用于体积渲染的边界框的运动。缺乏更灵活的运动模型导致了4D生成方法与最近的、接近照片级别逼真的视频生成模型之间现实感差距的存在。在这里,我们提出了TC4D:轨迹条件的文本到4D生成,将运动分解为全局和局部组件。我们使用由样条参数化的轨迹表示场景边界框的全局运动,通过来自文本到视频模型的监督学习符合全局轨迹的局部变形。我们的方法实现了沿任意轨迹动画化场景的合成、组合式场景生成,并显著改善了生成运动的逼真度和数量,我们通过定性评估和用户研究进行了评估。视频结果可在我们的网站上查看:https://sherwinbahmani.github.io/tc4d。
English
Recent techniques for text-to-4D generation synthesize dynamic 3D scenes
using supervision from pre-trained text-to-video models. However, existing
representations for motion, such as deformation models or time-dependent neural
representations, are limited in the amount of motion they can generate-they
cannot synthesize motion extending far beyond the bounding box used for volume
rendering. The lack of a more flexible motion model contributes to the gap in
realism between 4D generation methods and recent, near-photorealistic video
generation models. Here, we propose TC4D: trajectory-conditioned text-to-4D
generation, which factors motion into global and local components. We represent
the global motion of a scene's bounding box using rigid transformation along a
trajectory parameterized by a spline. We learn local deformations that conform
to the global trajectory using supervision from a text-to-video model. Our
approach enables the synthesis of scenes animated along arbitrary trajectories,
compositional scene generation, and significant improvements to the realism and
amount of generated motion, which we evaluate qualitatively and through a user
study. Video results can be viewed on our website:
https://sherwinbahmani.github.io/tc4d.Summary
AI-Generated Summary