DisCo：実世界における参照人間ダンス生成のための分離制御

要旨

生成AIは、特にテキスト記述に条件付けられた画像・動画合成において、コンピュータビジョン分野で著しい進歩を遂げてきた。しかしながら、ダンス合成のような人物中心のコンテンツ生成では、依然として課題が残っている。既存のダンス合成手法は、合成されたコンテンツと実世界のダンスシナリオとの間に存在する隔たりに苦戦している。本論文では、実世界のダンスシナリオに焦点を当てた新しい問題設定「参照人物ダンス生成」を定義する。この設定には以下の3つの重要な特性がある：（i）忠実性：合成結果は参照画像から人物前景と背景の外観を保持し、ターゲットポーズを正確に追従すること、（ii）一般化性：モデルは未見の人物、背景、ポーズに対して一般化できること、（iii）構成性：異なるソースから得られた既知/未知の人物、背景、ポーズの組み合わせを可能にすること。これらの課題に対処するため、我々は新しいアプローチDISCOを提案する。DISCOは、ダンス合成の忠実性と構成性を向上させる分離制御を備えた新しいモデルアーキテクチャと、未見の人物への一般化性を高める効果的人物属性事前学習を含む。大規模な定性的・定量的評価により、DISCOが多様な外観と柔軟な動きを備えた高品質な人物ダンス画像・動画を生成できることが実証されている。コード、デモ、動画、可視化結果は以下で公開されている：https://disco-dance.github.io/。

English

Generative AI has made significant strides in computer vision, particularly in image/video synthesis conditioned on text descriptions. Despite the advancements, it remains challenging especially in the generation of human-centric content such as dance synthesis. Existing dance synthesis methods struggle with the gap between synthesized content and real-world dance scenarios. In this paper, we define a new problem setting: Referring Human Dance Generation, which focuses on real-world dance scenarios with three important properties: (i) Faithfulness: the synthesis should retain the appearance of both human subject foreground and background from the reference image, and precisely follow the target pose; (ii) Generalizability: the model should generalize to unseen human subjects, backgrounds, and poses; (iii) Compositionality: it should allow for composition of seen/unseen subjects, backgrounds, and poses from different sources. To address these challenges, we introduce a novel approach, DISCO, which includes a novel model architecture with disentangled control to improve the faithfulness and compositionality of dance synthesis, and an effective human attribute pre-training for better generalizability to unseen humans. Extensive qualitative and quantitative results demonstrate that DISCO can generate high-quality human dance images and videos with diverse appearances and flexible motions. Code, demo, video and visualization are available at: https://disco-dance.github.io/.

DisCo：実世界における参照人間ダンス生成のための分離制御

DisCo: Disentangled Control for Referring Human Dance Generation in Real World

要旨

Support