DisCo:現實世界中參照人體舞蹈生成的解耦控制
DisCo: Disentangled Control for Referring Human Dance Generation in Real World
June 30, 2023
作者: Tan Wang, Linjie Li, Kevin Lin, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang
cs.AI
摘要
生成式人工智慧在電腦視覺領域已取得重大進展,尤其在基於文字描述的圖像/影片合成方面。儘管技術不斷進步,但在以人為核心的內容生成(如舞蹈合成)領域仍面臨挑戰。現有的舞蹈合成方法難以彌合合成內容與真實舞蹈場景之間的差距。本文定義了一個新問題設定:參照人體舞蹈生成,專注於真實舞蹈場景的三個關鍵特性:(i) 還原度:合成內容需保留參考圖像中人物前景與背景的外觀特徵,並精準遵循目標姿態;(ii) 泛化性:模型應能適應未見過的人物主體、背景和姿勢;(iii) 組合性:能靈活組合來自不同來源的已見/未見人物、背景與姿勢。為解決這些難題,我們提出創新方法DISCO,其特色包括:採用具解耦控制的新型模型架構以提升舞蹈合成的還原度與組合性,並通過有效的人物屬性預訓練增強對未見人物的泛化能力。大量定性與定量實驗結果表明,DISCO能生成具有多樣外觀與靈活動作的高品質人體舞蹈圖像及影片。程式碼、演示、影片與視覺化結果請參見:https://disco-dance.github.io/。
English
Generative AI has made significant strides in computer vision, particularly
in image/video synthesis conditioned on text descriptions. Despite the
advancements, it remains challenging especially in the generation of
human-centric content such as dance synthesis. Existing dance synthesis methods
struggle with the gap between synthesized content and real-world dance
scenarios. In this paper, we define a new problem setting: Referring Human
Dance Generation, which focuses on real-world dance scenarios with three
important properties: (i) Faithfulness: the synthesis should retain the
appearance of both human subject foreground and background from the reference
image, and precisely follow the target pose; (ii) Generalizability: the model
should generalize to unseen human subjects, backgrounds, and poses; (iii)
Compositionality: it should allow for composition of seen/unseen subjects,
backgrounds, and poses from different sources. To address these challenges, we
introduce a novel approach, DISCO, which includes a novel model architecture
with disentangled control to improve the faithfulness and compositionality of
dance synthesis, and an effective human attribute pre-training for better
generalizability to unseen humans. Extensive qualitative and quantitative
results demonstrate that DISCO can generate high-quality human dance images and
videos with diverse appearances and flexible motions. Code, demo, video and
visualization are available at: https://disco-dance.github.io/.