Uni3C:統一精確的3D增強相機與人體運動控制以實現視頻生成
Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation
April 21, 2025
作者: Chenjie Cao, Jingkai Zhou, Shikai Li, Jingyun Liang, Chaohui Yu, Fan Wang, Xiangyang Xue, Yanwei Fu
cs.AI
摘要
相機與人體運動控制在視頻生成領域已得到廣泛研究,但現有方法通常將二者分開處理,面臨著高質量雙重註釋數據有限的問題。為此,我們提出了Uni3C,一個統一的三維增強框架,用於精確控制視頻生成中的相機與人體運動。Uni3C包含兩大關鍵貢獻。首先,我們提出了一個即插即用的控制模塊PCDController,它與凍結的視頻生成骨幹網絡協同訓練,利用單目深度反投影的點雲實現精確的相機控制。通過結合點雲的強大三維先驗知識和視頻基礎模型的強大能力,PCDController展現了出色的泛化性能,無論推理骨幹是凍結還是微調,均能表現良好。這種靈活性使得Uni3C的不同模塊能在特定領域(即相機控制或人體運動控制)分別訓練,降低對聯合註釋數據的依賴。其次,我們提出了一種聯合對齊的三維世界引導機制,用於推理階段,無縫整合場景點雲與SMPL-X角色,從而統一相機與人體運動的控制信號。大量實驗證實,PCDController在驅動視頻生成微調骨幹的相機運動方面具有很強的魯棒性。Uni3C在相機可控性和人體運動質量上均大幅超越競爭對手。此外,我們還收集了包含挑戰性相機移動和人體動作的定制驗證集,以驗證我們方法的有效性。
English
Camera and human motion controls have been extensively studied for video
generation, but existing approaches typically address them separately,
suffering from limited data with high-quality annotations for both aspects. To
overcome this, we present Uni3C, a unified 3D-enhanced framework for precise
control of both camera and human motion in video generation. Uni3C includes two
key contributions. First, we propose a plug-and-play control module trained
with a frozen video generative backbone, PCDController, which utilizes
unprojected point clouds from monocular depth to achieve accurate camera
control. By leveraging the strong 3D priors of point clouds and the powerful
capacities of video foundational models, PCDController shows impressive
generalization, performing well regardless of whether the inference backbone is
frozen or fine-tuned. This flexibility enables different modules of Uni3C to be
trained in specific domains, i.e., either camera control or human motion
control, reducing the dependency on jointly annotated data. Second, we propose
a jointly aligned 3D world guidance for the inference phase that seamlessly
integrates both scenic point clouds and SMPL-X characters to unify the control
signals for camera and human motion, respectively. Extensive experiments
confirm that PCDController enjoys strong robustness in driving camera motion
for fine-tuned backbones of video generation. Uni3C substantially outperforms
competitors in both camera controllability and human motion quality.
Additionally, we collect tailored validation sets featuring challenging camera
movements and human actions to validate the effectiveness of our method.Summary
AI-Generated Summary