Boximator：為視頻合成生成豐富且可控制運動

摘要

在視頻合成中，生成豐富且可控制的運動是一個至關重要的挑戰。我們提出了一種名為Boximator的新方法，用於精細運動控制。Boximator引入了兩種約束類型：硬盒和軟盒。用戶使用硬盒在條件幀中選擇對象，然後使用任一類型的盒子在未來幀中粗略或嚴格地定義對象的位置、形狀或運動路徑。Boximator作為現有視頻擴散模型的插件。其訓練過程通過凍結原始權重並僅訓練控制模塊來保留基本模型的知識。為應對訓練挑戰，我們引入了一種新的自我跟踪技術，大大簡化了盒子-對象相關性的學習。從實證角度看，Boximator實現了最先進的視頻質量（FVD）分數，在兩個基本模型的基礎上取得了改進，並在納入盒約束後進一步增強。其強大的運動可控性通過邊界框對齊度量的急劇增加得到驗證。人類評估還表明，用戶更喜歡Boximator生成的結果，而不是基本模型。

English

Generating rich and controllable motion is a pivotal challenge in video synthesis. We propose Boximator, a new approach for fine-grained motion control. Boximator introduces two constraint types: hard box and soft box. Users select objects in the conditional frame using hard boxes and then use either type of boxes to roughly or rigorously define the object's position, shape, or motion path in future frames. Boximator functions as a plug-in for existing video diffusion models. Its training process preserves the base model's knowledge by freezing the original weights and training only the control module. To address training challenges, we introduce a novel self-tracking technique that greatly simplifies the learning of box-object correlations. Empirically, Boximator achieves state-of-the-art video quality (FVD) scores, improving on two base models, and further enhanced after incorporating box constraints. Its robust motion controllability is validated by drastic increases in the bounding box alignment metric. Human evaluation also shows that users favor Boximator generation results over the base model.

Boximator：為視頻合成生成豐富且可控制運動

Boximator: Generating Rich and Controllable Motions for Video Synthesis

摘要

Support