ChatPaper.aiChatPaper

MV-RAG:检索增强的多视角扩散模型

MV-RAG: Retrieval Augmented Multiview Diffusion

August 22, 2025
作者: Yosef Dayani, Omer Benishu, Sagie Benaim
cs.AI

摘要

文本到3D生成技术通过利用预训练的2D扩散先验取得了显著进展,能够生成高质量且3D一致的结果。然而,这些方法在处理域外(OOD)或罕见概念时往往表现不佳,导致结果不一致或不准确。为此,我们提出了MV-RAG,一种新颖的文本到3D生成流程,该流程首先从大规模真实世界2D图像库中检索相关图像,然后基于这些图像条件化多视角扩散模型,以合成一致且准确的多视角输出。训练这种基于检索的条件化模型采用了一种新颖的混合策略,该策略将结构化多视角数据与多样化的2D图像集合相结合。具体而言,一方面通过使用增强的条件视角来模拟检索差异,针对特定视角的重构进行多视角数据训练;另一方面,利用一组检索到的真实世界2D图像,采用独特的保留视角预测目标进行训练:模型从其他视角预测保留视角,从而从2D数据中推断3D一致性。为了进行严格的OOD评估,我们引入了一套具有挑战性的OOD提示集合。与最先进的文本到3D、图像到3D以及个性化基线方法的对比实验表明,我们的方法在处理OOD/罕见概念时,显著提升了3D一致性、照片真实感及文本遵循度,同时在标准基准测试中保持了竞争力。
English
Text-to-3D generation approaches have advanced significantly by leveraging pretrained 2D diffusion priors, producing high-quality and 3D-consistent outputs. However, they often fail to produce out-of-domain (OOD) or rare concepts, yielding inconsistent or inaccurate results. To this end, we propose MV-RAG, a novel text-to-3D pipeline that first retrieves relevant 2D images from a large in-the-wild 2D database and then conditions a multiview diffusion model on these images to synthesize consistent and accurate multiview outputs. Training such a retrieval-conditioned model is achieved via a novel hybrid strategy bridging structured multiview data and diverse 2D image collections. This involves training on multiview data using augmented conditioning views that simulate retrieval variance for view-specific reconstruction, alongside training on sets of retrieved real-world 2D images using a distinctive held-out view prediction objective: the model predicts the held-out view from the other views to infer 3D consistency from 2D data. To facilitate a rigorous OOD evaluation, we introduce a new collection of challenging OOD prompts. Experiments against state-of-the-art text-to-3D, image-to-3D, and personalization baselines show that our approach significantly improves 3D consistency, photorealism, and text adherence for OOD/rare concepts, while maintaining competitive performance on standard benchmarks.
PDF282August 26, 2025