ChatPaper.aiChatPaper

VP3D:释放用于文本到3D生成的2D视觉提示

VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation

March 25, 2024
作者: Yang Chen, Yingwei Pan, Haibo Yang, Ting Yao, Tao Mei
cs.AI

摘要

最近关于文本到3D生成的创新采用了得分蒸馏采样(SDS),这使得可以通过直接从2D扩散模型中蒸馏先前知识,实现对隐式3D模型(NeRF)的零样本学习。然而,当前基于SDS的模型仍然在复杂的文本提示方面存在困难,并且通常导致失真的3D模型,具有不真实的纹理或交叉视图不一致问题。在这项工作中,我们引入了一种新颖的视觉提示引导文本到3D扩散模型(VP3D),它明确释放了2D视觉提示中的视觉外观知识,以增强文本到3D生成。VP3D不仅仅监督SDS与文本提示,而是首先利用2D扩散模型从输入文本生成高质量图像,随后将其作为视觉提示,以明确的视觉外观加强SDS优化。同时,我们将SDS优化与额外的可微分奖励函数相结合,鼓励渲染3D模型的图像更好地与2D视觉提示视觉对齐,并在语义上与文本提示匹配。通过大量实验,我们展示了我们的VP3D中的2D视觉提示显着简化了3D模型的视觉外观学习,从而导致更高的视觉保真度和更详细的纹理。当用给定的参考图像替换自动生成的视觉提示时,VP3D能够触发一项新的样式化文本到3D生成任务。我们的项目页面位于https://vp3d-cvpr24.github.io。
English
Recent innovations on text-to-3D generation have featured Score Distillation Sampling (SDS), which enables the zero-shot learning of implicit 3D models (NeRF) by directly distilling prior knowledge from 2D diffusion models. However, current SDS-based models still struggle with intricate text prompts and commonly result in distorted 3D models with unrealistic textures or cross-view inconsistency issues. In this work, we introduce a novel Visual Prompt-guided text-to-3D diffusion model (VP3D) that explicitly unleashes the visual appearance knowledge in 2D visual prompt to boost text-to-3D generation. Instead of solely supervising SDS with text prompt, VP3D first capitalizes on 2D diffusion model to generate a high-quality image from input text, which subsequently acts as visual prompt to strengthen SDS optimization with explicit visual appearance. Meanwhile, we couple the SDS optimization with additional differentiable reward function that encourages rendering images of 3D models to better visually align with 2D visual prompt and semantically match with text prompt. Through extensive experiments, we show that the 2D Visual Prompt in our VP3D significantly eases the learning of visual appearance of 3D models and thus leads to higher visual fidelity with more detailed textures. It is also appealing in view that when replacing the self-generating visual prompt with a given reference image, VP3D is able to trigger a new task of stylized text-to-3D generation. Our project page is available at https://vp3d-cvpr24.github.io.
PDF61December 15, 2024