VP3D：釋放2D視覺提示以生成文本到3D的技術

摘要

最近在文本轉3D生成方面的創新特點是採用得分蒸餾抽樣（SDS），這使得通過直接從2D擴散模型中提煉先前知識來實現隱式3D模型（NeRF）的零樣本學習成為可能。然而，目前基於SDS的模型仍然在處理錯綜複雜的文本提示時遇到困難，通常導致產生具有不現實紋理或視角不一致問題的扭曲3D模型。在這項工作中，我們引入了一種新穎的視覺提示引導文本到3D擴散模型（VP3D），該模型明確地釋放2D視覺提示中的視覺外觀知識，以提升文本到3D生成的效果。VP3D不僅僅通過文本提示監督SDS，而是首先利用2D擴散模型從輸入文本生成高質量圖像，然後將其作為視覺提示，以明確的視覺外觀來加強SDS的優化。同時，我們將SDS優化與額外的可微獎勵函數相結合，該函數鼓勵將3D模型的渲染圖像與2D視覺提示更好地視覺對齊，並在語義上與文本提示匹配。通過大量實驗，我們展示了我們的VP3D中的2D視覺提示顯著地簡化了3D模型的視覺外觀學習，從而實現了更高的視覺保真度和更詳細的紋理。當用給定的參考圖像替換自生成的視覺提示時，VP3D還能觸發一個新的風格化文本到3D生成任務，這一點也很吸引人。我們的項目頁面位於https://vp3d-cvpr24.github.io。

English

Recent innovations on text-to-3D generation have featured Score Distillation Sampling (SDS), which enables the zero-shot learning of implicit 3D models (NeRF) by directly distilling prior knowledge from 2D diffusion models. However, current SDS-based models still struggle with intricate text prompts and commonly result in distorted 3D models with unrealistic textures or cross-view inconsistency issues. In this work, we introduce a novel Visual Prompt-guided text-to-3D diffusion model (VP3D) that explicitly unleashes the visual appearance knowledge in 2D visual prompt to boost text-to-3D generation. Instead of solely supervising SDS with text prompt, VP3D first capitalizes on 2D diffusion model to generate a high-quality image from input text, which subsequently acts as visual prompt to strengthen SDS optimization with explicit visual appearance. Meanwhile, we couple the SDS optimization with additional differentiable reward function that encourages rendering images of 3D models to better visually align with 2D visual prompt and semantically match with text prompt. Through extensive experiments, we show that the 2D Visual Prompt in our VP3D significantly eases the learning of visual appearance of 3D models and thus leads to higher visual fidelity with more detailed textures. It is also appealing in view that when replacing the self-generating visual prompt with a given reference image, VP3D is able to trigger a new task of stylized text-to-3D generation. Our project page is available at https://vp3d-cvpr24.github.io.

VP3D：釋放2D視覺提示以生成文本到3D的技術

VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation

摘要

Support