ZeroAvatar：從單張圖像生成零樣式3D頭像

摘要

最近在文本到圖像生成方面取得的進展已經顯著促進了零樣本3D形狀生成的進步。這是通過得分蒸餾實現的，該方法利用預先訓練的文本到圖像擴散模型來優化3D神經表示的參數，例如神經輻射場（NeRF）。儘管顯示出有希望的結果，現有方法通常無法保留複雜形狀（例如人體）的幾何形狀。為了應對這一挑戰，我們提出了ZeroAvatar，這是一種在優化過程中引入明確的3D人體先驗的方法。具體而言，我們首先從單張圖像中估計並微調參數化人體的參數。然後在優化過程中，我們使用姿態參數化人體作為額外的幾何約束來規範擴散模型以及基礎密度場。最後，我們提出了一個UV引導的紋理規範項，進一步引導在不可見的身體部位完成紋理。我們展示了ZeroAvatar顯著增強了基於優化的圖像到3D頭像生成的魯棒性和3D一致性，優於現有的零樣本圖像到3D方法。

English

Recent advancements in text-to-image generation have enabled significant progress in zero-shot 3D shape generation. This is achieved by score distillation, a methodology that uses pre-trained text-to-image diffusion models to optimize the parameters of a 3D neural presentation, e.g. Neural Radiance Field (NeRF). While showing promising results, existing methods are often not able to preserve the geometry of complex shapes, such as human bodies. To address this challenge, we present ZeroAvatar, a method that introduces the explicit 3D human body prior to the optimization process. Specifically, we first estimate and refine the parameters of a parametric human body from a single image. Then during optimization, we use the posed parametric body as additional geometry constraint to regularize the diffusion model as well as the underlying density field. Lastly, we propose a UV-guided texture regularization term to further guide the completion of texture on invisible body parts. We show that ZeroAvatar significantly enhances the robustness and 3D consistency of optimization-based image-to-3D avatar generation, outperforming existing zero-shot image-to-3D methods.

ZeroAvatar：從單張圖像生成零樣式3D頭像

ZeroAvatar: Zero-shot 3D Avatar Generation from a Single Image

摘要

Support