ChatPaper.aiChatPaper

Points-to-3D:弥合稀疏点与可控形状文本到3D生成之间的差距

Points-to-3D: Bridging the Gap between Sparse Points and Shape-Controllable Text-to-3D Generation

July 26, 2023
作者: Chaohui Yu, Qiang Zhou, Jingliang Li, Zhe Zhang, Zhibin Wang, Fan Wang
cs.AI

摘要

最近,文本到3D生成引起了广泛关注,这得益于在数十亿图像文本对上训练的2D扩散模型。现有方法主要依赖于分数蒸馏,以利用2D扩散先验来监督3D模型的生成,例如NeRF。然而,分数蒸馏容易出现视角不一致问题,而隐式NeRF建模也可能导致任意形状,从而导致不够逼真和不可控的3D生成。在这项工作中,我们提出了一个灵活的Points-to-3D框架,通过从2D和3D扩散模型中提炼知识,弥合稀疏但自由可用的3D点与逼真形状可控的3D生成之间的差距。Points-to-3D的核心思想是引入可控稀疏3D点来指导文本到3D的生成。具体而言,我们使用从3D扩散模型Point-E生成的稀疏点云作为几何先验,以单个参考图像为条件。为了更好地利用稀疏3D点,我们提出了一种高效的点云引导损失,以自适应地驱动NeRF的几何形状与稀疏3D点的形状对齐。除了控制几何形状,我们提出了优化NeRF以获得更具视角一致性的外观。具体而言,我们对公开可用的2D图像扩散模型ControlNet进行分数蒸馏,以文本和学习的紧凑几何深度图为条件。定性和定量比较表明,Points-to-3D提高了视角一致性,并实现了良好的形状可控性,用于文本到3D生成。Points-to-3D为用户提供了改进和控制文本到3D生成的新途径。
English
Text-to-3D generation has recently garnered significant attention, fueled by 2D diffusion models trained on billions of image-text pairs. Existing methods primarily rely on score distillation to leverage the 2D diffusion priors to supervise the generation of 3D models, e.g., NeRF. However, score distillation is prone to suffer the view inconsistency problem, and implicit NeRF modeling can also lead to an arbitrary shape, thus leading to less realistic and uncontrollable 3D generation. In this work, we propose a flexible framework of Points-to-3D to bridge the gap between sparse yet freely available 3D points and realistic shape-controllable 3D generation by distilling the knowledge from both 2D and 3D diffusion models. The core idea of Points-to-3D is to introduce controllable sparse 3D points to guide the text-to-3D generation. Specifically, we use the sparse point cloud generated from the 3D diffusion model, Point-E, as the geometric prior, conditioned on a single reference image. To better utilize the sparse 3D points, we propose an efficient point cloud guidance loss to adaptively drive the NeRF's geometry to align with the shape of the sparse 3D points. In addition to controlling the geometry, we propose to optimize the NeRF for a more view-consistent appearance. To be specific, we perform score distillation to the publicly available 2D image diffusion model ControlNet, conditioned on text as well as depth map of the learned compact geometry. Qualitative and quantitative comparisons demonstrate that Points-to-3D improves view consistency and achieves good shape controllability for text-to-3D generation. Points-to-3D provides users with a new way to improve and control text-to-3D generation.
PDF90December 15, 2024