零到一至A：利用視頻擴散技術從單張圖像生成可動畫頭像的零樣本方法

摘要

可動態頭像生成通常需要大量數據進行訓練。為降低數據需求，一個自然的解決方案是利用現有的無數據靜態頭像生成方法，例如使用預訓練擴散模型與分數蒸餾採樣（SDS），這些方法能將頭像與擴散模型生成的偽真值輸出對齊。然而，直接從視頻擴散中蒸餾4D頭像往往會因生成視頻中的空間和時間不一致性而導致過於平滑的結果。為解決這一問題，我們提出了Zero-1-to-A，這是一種利用視頻擴散模型合成空間和時間一致性數據集以重建4D頭像的穩健方法。具體而言，Zero-1-to-A以漸進方式迭代構建視頻數據集並優化可動態頭像，確保頭像質量在整個學習過程中平滑且一致地提升。這一漸進學習包含兩個階段：（1）空間一致性學習固定表情並從正面到側面視圖進行學習，（2）時間一致性學習固定視圖並從放鬆到誇張的表情進行學習，以簡單到複雜的方式生成4D頭像。大量實驗表明，與現有的基於擴散的方法相比，Zero-1-to-A在保真度、動畫質量和渲染速度上均有提升，為逼真頭像創建提供了解決方案。代碼公開於：https://github.com/ZhenglinZhou/Zero-1-to-A。

English

Animatable head avatar generation typically requires extensive data for training. To reduce the data requirements, a natural solution is to leverage existing data-free static avatar generation methods, such as pre-trained diffusion models with score distillation sampling (SDS), which align avatars with pseudo ground-truth outputs from the diffusion model. However, directly distilling 4D avatars from video diffusion often leads to over-smooth results due to spatial and temporal inconsistencies in the generated video. To address this issue, we propose Zero-1-to-A, a robust method that synthesizes a spatial and temporal consistency dataset for 4D avatar reconstruction using the video diffusion model. Specifically, Zero-1-to-A iteratively constructs video datasets and optimizes animatable avatars in a progressive manner, ensuring that avatar quality increases smoothly and consistently throughout the learning process. This progressive learning involves two stages: (1) Spatial Consistency Learning fixes expressions and learns from front-to-side views, and (2) Temporal Consistency Learning fixes views and learns from relaxed to exaggerated expressions, generating 4D avatars in a simple-to-complex manner. Extensive experiments demonstrate that Zero-1-to-A improves fidelity, animation quality, and rendering speed compared to existing diffusion-based methods, providing a solution for lifelike avatar creation. Code is publicly available at: https://github.com/ZhenglinZhou/Zero-1-to-A.

零到一至A：利用視頻擴散技術從單張圖像生成可動畫頭像的零樣本方法

Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion

摘要

Support