Sherpa3D：透過粗糙3D提升高保真度文本轉3D生成效能

摘要

最近，透過利用2D和3D擴散模型，從文本提示中創建3D內容已經展示出顯著的進展。雖然3D擴散模型確保了出色的多視角一致性，但由於有限的3D數據，它們生成高質量和多樣化的3D資產的能力受到了阻礙。相比之下，2D擴散模型找到了一種蒸餾方法，實現了出色的泛化和豐富的細節，而無需任何3D數據。然而，2D提升方法受困於固有的視角不可知模糊性，進而導致嚴重的多面鄧尼斯問題，即文本提示無法提供足夠的指導以學習一致的3D結果。我們研究如何充分利用易於獲取的粗略3D知識來增強提示並引導2D提升優化以進行精細化，而非重新訓練昂貴的視角感知模型。在本文中，我們提出了Sherpa3D，一個新的文本轉3D框架，同時實現高保真度、泛化性和幾何一致性。具體來說，我們設計了一對從3D擴散模型生成的粗略3D先驅中衍生出的引導策略：用於幾何保真度的結構引導和用於3D一致性的語義引導。通過這兩種引導方式，2D擴散模型豐富了3D內容，產生了多樣化和高質量的結果。大量實驗顯示，我們的Sherpa3D在質量和3D一致性方面優於最先進的文本轉3D方法。

English

Recently, 3D content creation from text prompts has demonstrated remarkable progress by utilizing 2D and 3D diffusion models. While 3D diffusion models ensure great multi-view consistency, their ability to generate high-quality and diverse 3D assets is hindered by the limited 3D data. In contrast, 2D diffusion models find a distillation approach that achieves excellent generalization and rich details without any 3D data. However, 2D lifting methods suffer from inherent view-agnostic ambiguity thereby leading to serious multi-face Janus issues, where text prompts fail to provide sufficient guidance to learn coherent 3D results. Instead of retraining a costly viewpoint-aware model, we study how to fully exploit easily accessible coarse 3D knowledge to enhance the prompts and guide 2D lifting optimization for refinement. In this paper, we propose Sherpa3D, a new text-to-3D framework that achieves high-fidelity, generalizability, and geometric consistency simultaneously. Specifically, we design a pair of guiding strategies derived from the coarse 3D prior generated by the 3D diffusion model: a structural guidance for geometric fidelity and a semantic guidance for 3D coherence. Employing the two types of guidance, the 2D diffusion model enriches the 3D content with diversified and high-quality results. Extensive experiments show the superiority of our Sherpa3D over the state-of-the-art text-to-3D methods in terms of quality and 3D consistency.

Sherpa3D：透過粗糙3D提升高保真度文本轉3D生成效能

Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

摘要

Support