零到一至A:利用视频扩散实现单图到可动画头部化身的零样本生成
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion
March 20, 2025
作者: Zhou Zhenglin, Ma Fan, Fan Hehe, Chua Tat-Seng
cs.AI
摘要
可动画头部虚拟形象的生成通常需要大量数据进行训练。为了减少数据需求,一个自然的解决方案是利用现有的无需数据的静态虚拟形象生成方法,例如采用预训练的扩散模型结合分数蒸馏采样(SDS),这些方法通过将虚拟形象与扩散模型生成的伪真实输出对齐来实现。然而,直接从视频扩散中蒸馏4D虚拟形象往往会导致结果过于平滑,这是由于生成视频中存在空间和时间上的不一致性。为解决这一问题,我们提出了Zero-1-to-A,一种稳健的方法,它利用视频扩散模型合成一个空间和时间一致性的数据集,用于4D虚拟形象重建。具体而言,Zero-1-to-A以渐进方式迭代构建视频数据集并优化可动画虚拟形象,确保在学习过程中虚拟形象的质量平滑且一致地提升。这一渐进学习包含两个阶段:(1)空间一致性学习固定表情并从正面到侧面视角进行学习,(2)时间一致性学习固定视角并从放松到夸张的表情进行学习,以从简单到复杂的方式生成4D虚拟形象。大量实验表明,与现有的基于扩散的方法相比,Zero-1-to-A在保真度、动画质量和渲染速度上均有提升,为逼真虚拟形象的创建提供了解决方案。代码已公开于:https://github.com/ZhenglinZhou/Zero-1-to-A。
English
Animatable head avatar generation typically requires extensive data for
training. To reduce the data requirements, a natural solution is to leverage
existing data-free static avatar generation methods, such as pre-trained
diffusion models with score distillation sampling (SDS), which align avatars
with pseudo ground-truth outputs from the diffusion model. However, directly
distilling 4D avatars from video diffusion often leads to over-smooth results
due to spatial and temporal inconsistencies in the generated video. To address
this issue, we propose Zero-1-to-A, a robust method that synthesizes a spatial
and temporal consistency dataset for 4D avatar reconstruction using the video
diffusion model. Specifically, Zero-1-to-A iteratively constructs video
datasets and optimizes animatable avatars in a progressive manner, ensuring
that avatar quality increases smoothly and consistently throughout the learning
process. This progressive learning involves two stages: (1) Spatial Consistency
Learning fixes expressions and learns from front-to-side views, and (2)
Temporal Consistency Learning fixes views and learns from relaxed to
exaggerated expressions, generating 4D avatars in a simple-to-complex manner.
Extensive experiments demonstrate that Zero-1-to-A improves fidelity, animation
quality, and rendering speed compared to existing diffusion-based methods,
providing a solution for lifelike avatar creation. Code is publicly available
at: https://github.com/ZhenglinZhou/Zero-1-to-A.Summary
AI-Generated Summary