ChatPaper.aiChatPaper

RodinHD:使用扩散模型实现高保真度的3D头像生成

RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models

July 9, 2024
作者: Bowen Zhang, Yiji Cheng, Chunyu Wang, Ting Zhang, Jiaolong Yang, Yansong Tang, Feng Zhao, Dong Chen, Baining Guo
cs.AI

摘要

我们提出了RodinHD,它可以从一张肖像图像生成高保真度的3D头像。现有方法无法捕捉复杂的细节,比如发型,而我们在本文中解决了这个问题。我们首先确定了一个被忽视的问题,即在许多头像上顺序拟合三面体时出现的灾难性遗忘问题,这是由MLP解码器共享方案引起的。为了克服这个问题,我们提出了一种新颖的数据调度策略和权重合并正则化项,这提高了解码器呈现更清晰细节的能力。此外,我们通过计算一个更精细的分层表示来优化肖像图像的引导效果,捕捉丰富的2D纹理线索,并通过交叉注意力将它们注入到3D扩散模型的多个层中。当在经过针对三面体优化的噪声调度下训练了46K个头像后,生成的模型可以生成具有明显更好细节的3D头像,且能够泛化到野外肖像输入。
English
We present RodinHD, which can generate high-fidelity 3D avatars from a portrait image. Existing methods fail to capture intricate details such as hairstyles which we tackle in this paper. We first identify an overlooked problem of catastrophic forgetting that arises when fitting triplanes sequentially on many avatars, caused by the MLP decoder sharing scheme. To overcome this issue, we raise a novel data scheduling strategy and a weight consolidation regularization term, which improves the decoder's capability of rendering sharper details. Additionally, we optimize the guiding effect of the portrait image by computing a finer-grained hierarchical representation that captures rich 2D texture cues, and injecting them to the 3D diffusion model at multiple layers via cross-attention. When trained on 46K avatars with a noise schedule optimized for triplanes, the resulting model can generate 3D avatars with notably better details than previous methods and can generalize to in-the-wild portrait input.

Summary

AI-Generated Summary

PDF241November 28, 2024