ChatPaper.aiChatPaper

TCAN:使用擴散模型對人類影像進行具有時間一致姿勢引導的動畫化

TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models

July 12, 2024
作者: Jeongho Kim, Min-Jung Kim, Junsoo Lee, Jaegul Choo
cs.AI

摘要

基於姿勢驅動的人像動畫擴散模型展現了在逼真人類影片合成方面的卓越能力。儘管先前方法取得了令人期待的成果,但在實現時間上一致的動畫和確保與現成姿勢檢測器的穩健性方面仍存在挑戰。本文提出了TCAN,一種能夠應對錯誤姿勢並在時間上保持一致的基於姿勢驅動的人像動畫方法。與先前方法不同,我們利用預先訓練的ControlNet,無需微調,以利用其從眾多姿勢-影像-標題三元組中獲得的豐富先前知識。為了保持ControlNet凍結,我們將LoRA調整到UNet層,使網絡能夠對齊姿勢和外觀特徵之間的潛在空間。此外,通過在ControlNet中引入額外的時間層,我們增強了對姿勢檢測器的離群值的穩健性。通過對時間軸上的注意力地圖進行分析,我們還設計了一個利用姿勢信息的新型溫度地圖,從而實現更靜態的背景。大量實驗表明,所提出的方法在涵蓋各種姿勢(如卡通)的視頻合成任務中取得了令人期待的結果。項目頁面:https://eccv2024tcan.github.io/
English
Pose-driven human-image animation diffusion models have shown remarkable capabilities in realistic human video synthesis. Despite the promising results achieved by previous approaches, challenges persist in achieving temporally consistent animation and ensuring robustness with off-the-shelf pose detectors. In this paper, we present TCAN, a pose-driven human image animation method that is robust to erroneous poses and consistent over time. In contrast to previous methods, we utilize the pre-trained ControlNet without fine-tuning to leverage its extensive pre-acquired knowledge from numerous pose-image-caption pairs. To keep the ControlNet frozen, we adapt LoRA to the UNet layers, enabling the network to align the latent space between the pose and appearance features. Additionally, by introducing an additional temporal layer to the ControlNet, we enhance robustness against outliers of the pose detector. Through the analysis of attention maps over the temporal axis, we also designed a novel temperature map leveraging pose information, allowing for a more static background. Extensive experiments demonstrate that the proposed method can achieve promising results in video synthesis tasks encompassing various poses, like chibi. Project Page: https://eccv2024tcan.github.io/

Summary

AI-Generated Summary

PDF102November 28, 2024