ChatPaper.aiChatPaper

TC4D:軌跡條件文本生成到4D

TC4D: Trajectory-Conditioned Text-to-4D Generation

March 26, 2024
作者: Sherwin Bahmani, Xian Liu, Yifan Wang, Ivan Skorokhodov, Victor Rong, Ziwei Liu, Xihui Liu, Jeong Joon Park, Sergey Tulyakov, Gordon Wetzstein, Andrea Tagliasacchi, David B. Lindell
cs.AI

摘要

最近的文本到4D生成技術利用預先訓練的文本到視頻模型進行監督,合成動態的3D場景。然而,現有的運動表示,如變形模型或時間依賴的神經表示,受到能夠生成運動量限制-它們無法合成超出用於體素渲染的邊界框遠的運動。缺乏更靈活的運動模型導致4D生成方法與最近的、接近照片般逼真的視頻生成模型之間的現實差距。在這裡,我們提出TC4D:軌跡條件的文本到4D生成,將運動分解為全局和局部組件。我們使用由樣条參數化的軌跡來表示場景邊界框的全局運動,通過剛性變換。我們通過來自文本到視頻模型的監督來學習符合全局軌跡的局部變形。我們的方法使得能夠合成沿著任意軌跡動畫的場景,進行組合式場景生成,並顯著改善了生成運動的逼真度和量,我們通過定性和用戶研究來進行評估。視頻結果可在我們的網站上查看:https://sherwinbahmani.github.io/tc4d。
English
Recent techniques for text-to-4D generation synthesize dynamic 3D scenes using supervision from pre-trained text-to-video models. However, existing representations for motion, such as deformation models or time-dependent neural representations, are limited in the amount of motion they can generate-they cannot synthesize motion extending far beyond the bounding box used for volume rendering. The lack of a more flexible motion model contributes to the gap in realism between 4D generation methods and recent, near-photorealistic video generation models. Here, we propose TC4D: trajectory-conditioned text-to-4D generation, which factors motion into global and local components. We represent the global motion of a scene's bounding box using rigid transformation along a trajectory parameterized by a spline. We learn local deformations that conform to the global trajectory using supervision from a text-to-video model. Our approach enables the synthesis of scenes animated along arbitrary trajectories, compositional scene generation, and significant improvements to the realism and amount of generated motion, which we evaluate qualitatively and through a user study. Video results can be viewed on our website: https://sherwinbahmani.github.io/tc4d.

Summary

AI-Generated Summary

PDF181December 15, 2024