ChatPaper.aiChatPaper

GaMO:面向稀疏视图三维重建的几何感知多视角扩散外绘技术

GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction

December 31, 2025
作者: Yi-Chuan Huang, Hao-Jen Chien, Chin-Yang Lin, Ying-Huan Chen, Yu-Lun Liu
cs.AI

摘要

近期三维重建技术虽在多视图密集图像的高质量场景采集方面取得显著进展,但在输入视角有限时仍面临挑战。为应对此问题,研究者已采用包括正则化技术、语义先验和几何约束在内的多种方法。最新的基于扩散模型的方法通过从新相机位姿生成新颖视角以增强训练数据,展现出超越早期正则化与先验技术的显著改进。然而尽管取得进展,我们发现现有先进方法存在三个关键局限:已知视角外围的覆盖范围不足、生成视角间的几何不一致性以及计算密集型流程。我们提出GaMO(几何感知多视图外绘框架),通过多视图外绘重构稀疏视图重建范式。与生成新视点不同,GaMO从现有相机位姿扩展视野范围,在提供更广场景覆盖的同时天然保持几何一致性。我们的方法以零样本方式采用多视图条件化与几何感知去噪策略,无需训练。在Replica和ScanNet++上的大量实验表明,该方法在3、6、9个输入视角下均实现最先进的重建质量,在PSNR和LPIPS指标上超越先前方法,同时较基于扩散的SOTA方法实现25倍加速,处理时间低于10分钟。项目页面:https://yichuanh.github.io/GaMO/
English
Recent advances in 3D reconstruction have achieved remarkable progress in high-quality scene capture from dense multi-view imagery, yet struggle when input views are limited. Various approaches, including regularization techniques, semantic priors, and geometric constraints, have been implemented to address this challenge. Latest diffusion-based methods have demonstrated substantial improvements by generating novel views from new camera poses to augment training data, surpassing earlier regularization and prior-based techniques. Despite this progress, we identify three critical limitations in these state-of-the-art approaches: inadequate coverage beyond known view peripheries, geometric inconsistencies across generated views, and computationally expensive pipelines. We introduce GaMO (Geometry-aware Multi-view Outpainter), a framework that reformulates sparse-view reconstruction through multi-view outpainting. Instead of generating new viewpoints, GaMO expands the field of view from existing camera poses, which inherently preserves geometric consistency while providing broader scene coverage. Our approach employs multi-view conditioning and geometry-aware denoising strategies in a zero-shot manner without training. Extensive experiments on Replica and ScanNet++ demonstrate state-of-the-art reconstruction quality across 3, 6, and 9 input views, outperforming prior methods in PSNR and LPIPS, while achieving a 25times speedup over SOTA diffusion-based methods with processing time under 10 minutes. Project page: https://yichuanh.github.io/GaMO/
PDF211January 2, 2026