編排動態物體世界
Choreographing a World of Dynamic Objects
January 7, 2026
作者: Yanzhe Lyu, Chen Geng, Karthik Dharmarajan, Yunzhi Zhang, Hadi Alzayer, Shangzhe Wu, Jiajun Wu
cs.AI
摘要
我們物理四維(三維空間+時間)世界中的動態物體持續演化、變形並與其他物體交互作用,形成多樣化的四維場景動態。本文提出一種通用生成框架CHORD,能為動態物體與場景進行「編舞」,並合成此類動態現象。傳統基於規則的圖形學流程雖能創建這類動態,但依賴特定類別的啟發式方法,不僅耗費人力且缺乏擴展性。近期基於學習的方法通常需要大規模數據集,但可能無法涵蓋所有目標物體類別。我們的方法另闢蹊徑,通過提出基於蒸餾的流程,從二維視頻的歐拉表徵中提取隱藏的豐富拉格朗日運動信息,從而繼承了視頻生成模型的通用性。本方法具有通用性、多功能性且與物體類別無關。我們通過生成多樣化的多體四維動態實驗驗證其有效性,展現相較現有方法的優勢,並演示其在生成機器人操作策略中的應用性。項目頁面:https://yanzhelyu.github.io/chord
English
Dynamic objects in our physical 4D (3D + time) world are constantly evolving, deforming, and interacting with other objects, leading to diverse 4D scene dynamics. In this paper, we present a universal generative pipeline, CHORD, for CHOReographing Dynamic objects and scenes and synthesizing this type of phenomena. Traditional rule-based graphics pipelines to create these dynamics are based on category-specific heuristics, yet are labor-intensive and not scalable. Recent learning-based methods typically demand large-scale datasets, which may not cover all object categories in interest. Our approach instead inherits the universality from the video generative models by proposing a distillation-based pipeline to extract the rich Lagrangian motion information hidden in the Eulerian representations of 2D videos. Our method is universal, versatile, and category-agnostic. We demonstrate its effectiveness by conducting experiments to generate a diverse range of multi-body 4D dynamics, show its advantage compared to existing methods, and demonstrate its applicability in generating robotics manipulation policies. Project page: https://yanzhelyu.github.io/chord