RodinHD:使用擴散模型進行高保真度3D頭像生成
RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models
July 9, 2024
作者: Bowen Zhang, Yiji Cheng, Chunyu Wang, Ting Zhang, Jiaolong Yang, Yansong Tang, Feng Zhao, Dong Chen, Baining Guo
cs.AI
摘要
我們提出了RodinHD,它可以從肖像圖生成高保真度的3D頭像。現有方法無法捕捉複雜的細節,如髮型,在本文中我們解決了這個問題。我們首先識別了一個被忽視的問題,即在許多頭像上依次擬合三平面時出現的災難性遺忘問題,這是由MLP解碼器共享方案引起的。為了克服這個問題,我們提出了一種新的數據排程策略和一個權重合併正則項,這可以提高解碼器呈現更銳利細節的能力。此外,我們通過計算一個更細粒度的分層表示來優化肖像圖的引導效果,捕捉豐富的2D紋理提示,並通過交叉注意力將它們注入到3D擴散模型的多個層中。當在針對三平面進行優化的噪聲排程下對46K個頭像進行訓練時,生成的模型可以生成具有明顯更好細節的3D頭像,並且可以推廣到野外肖像輸入。
English
We present RodinHD, which can generate high-fidelity 3D avatars from a
portrait image. Existing methods fail to capture intricate details such as
hairstyles which we tackle in this paper. We first identify an overlooked
problem of catastrophic forgetting that arises when fitting triplanes
sequentially on many avatars, caused by the MLP decoder sharing scheme. To
overcome this issue, we raise a novel data scheduling strategy and a weight
consolidation regularization term, which improves the decoder's capability of
rendering sharper details. Additionally, we optimize the guiding effect of the
portrait image by computing a finer-grained hierarchical representation that
captures rich 2D texture cues, and injecting them to the 3D diffusion model at
multiple layers via cross-attention. When trained on 46K avatars with a noise
schedule optimized for triplanes, the resulting model can generate 3D avatars
with notably better details than previous methods and can generalize to
in-the-wild portrait input.Summary
AI-Generated Summary