SEEAvatar：具有受限幾何和外觀的照片逼真文本轉3D頭像生成

摘要

憑藉大規模文本到圖像生成模型的支持，文本到3D頭像生成已取得令人期待的進展。然而，大多數方法無法產生逼真的結果，受限於不精確的幾何形狀和低質量外觀。為了更實用的頭像生成，我們提出了SEEAvatar，一種從文本生成逼真3D頭像的方法，該方法使用自我演進約束來解耦幾何形狀和外觀。對於幾何形狀，我們建議使用模板頭像將優化的頭像約束在一個合理的全局形狀中。模板頭像以人類先驗信息初始化，並可以定期由優化的頭像更新為演進模板，從而實現更靈活的形狀生成。此外，幾何形狀還受到靜態人體先驗信息的約束，例如臉部和手部，以保持精細的結構。對於外觀生成，我們使用擴散模型通過提示工程增強，引導基於物理的渲染管線生成逼真的紋理。對反照率紋理應用光線約束以抑制不正確的照明效果。實驗表明，我們的方法在全局和局部幾何形狀以及外觀質量上均遠遠優於先前的方法。由於我們的方法可以生成高質量的網格和紋理，這些資產可以直接應用於經典圖形管線中，在任何照明條件下進行逼真渲染。項目頁面位於：https://seeavatar3d.github.io。

English

Powered by large-scale text-to-image generation models, text-to-3D avatar generation has made promising progress. However, most methods fail to produce photorealistic results, limited by imprecise geometry and low-quality appearance. Towards more practical avatar generation, we present SEEAvatar, a method for generating photorealistic 3D avatars from text with SElf-Evolving constraints for decoupled geometry and appearance. For geometry, we propose to constrain the optimized avatar in a decent global shape with a template avatar. The template avatar is initialized with human prior and can be updated by the optimized avatar periodically as an evolving template, which enables more flexible shape generation. Besides, the geometry is also constrained by the static human prior in local parts like face and hands to maintain the delicate structures. For appearance generation, we use diffusion model enhanced by prompt engineering to guide a physically based rendering pipeline to generate realistic textures. The lightness constraint is applied on the albedo texture to suppress incorrect lighting effect. Experiments show that our method outperforms previous methods on both global and local geometry and appearance quality by a large margin. Since our method can produce high-quality meshes and textures, such assets can be directly applied in classic graphics pipeline for realistic rendering under any lighting condition. Project page at: https://seeavatar3d.github.io.

SEEAvatar：具有受限幾何和外觀的照片逼真文本轉3D頭像生成

SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance

摘要

Support