C-GenReg: 확률적 양식 융합을 통한 다중 뷰 일관성 기하-이미지 생성 기반의 학습 없이 가능한 3D 포인트 클라우드 정합

초록

본 논문에서는 세계적 규모의 생성적 사전 지식과 정합 지향 Vision Foundation Model(VFM)의 상호 보완적 강점을 활용하는 학습 없는 3D 포인트 클라우드 정합 프레임워크인 C-GenReg를 소개한다. 현재의 학습 기반 3D 포인트 클라우드 정합 방법은 센싱 방식, 샘플링 차이, 환경 변화에 대한 일반화에 어려움을 겪는다. 따라서 C-GenReg는 World Foundation Model을 사용하여 입력 형상에서 다중 뷰 일관성 RGB 표현을 합성함으로써 정합 문제를 VFM이 뛰어난 보조 이미지 영역으로 전이시켜 기하학적 포인트 클라우드 정합 파이프라인을 강화한다. 이 생성적 전이는 미세 조정 없이도 소스 및 타겟 뷰 간의 공간적 일관성을 보존한다. 생성된 뷰에서 조밀한 대응점 추적을 위해 사전 학습된 VFM이 대응점을 추출하며, 결과적인 픽셀 대응점은 원본 깊이 맵을 통해 3D 공간으로 재투영된다. 강건성 향상을 위해 생성된 RGB 분기와 원시 기하 분기라는 두 독립적인 대응점 사후확률을 결합하는 "Match-then-Fuse" 확률론적 Cold-Fusion 기법을 도입한다. 이 원리 기반 융합은 각 모달리티의 귀납적 편향을 보존하며 추가 학습 없이 보정된 신뢰도를 제공한다. C-GenReg는 제로샷 및 플러그앤플레이 방식으로, 모든 모듈은 사전 학습되었으며 미세 조정 없이 동작한다. 실내(3DMatch, ScanNet) 및 실외(Waymo) 벤치마크에 대한 광범위한 실험을 통해 강력한 제로샷 성능과 우수한 크로스도메인 일반화 능력을 입증하였다. 본 연구는 실제 실외 LiDAR 데이터(이미지 데이터 없음)에서 성공적으로 작동하는 최초의 생성적 정합 프레임워크를 보여준다.

English

We introduce C-GenReg, a training-free framework for 3D point cloud registration that leverages the complementary strengths of world-scale generative priors and registration-oriented Vision Foundation Models (VFMs). Current learning-based 3D point cloud registration methods struggle to generalize across sensing modalities, sampling differences, and environments. Hence, C-GenReg augments the geometric point cloud registration branch by transferring the matching problem into an auxiliary image domain, where VFMs excel, using a World Foundation Model to synthesize multi-view-consistent RGB representations from the input geometry. This generative transfer, preserves spatial coherence across source and target views without any fine-tuning. From these generated views, a VFM pretrained for finding dense correspondences extracts matches. The resulting pixel correspondences are lifted back to 3D via the original depth maps. To further enhance robustness, we introduce a "Match-then-Fuse" probabilistic cold-fusion scheme that combines two independent correspondence posteriors, that of the generated-RGB branch with that of the raw geometric branch. This principled fusion preserves each modality inductive bias and provides calibrated confidence without any additional learning. C-GenReg is zero-shot and plug-and-play: all modules are pretrained and operate without fine-tuning. Extensive experiments on indoor (3DMatch, ScanNet) and outdoor (Waymo) benchmarks demonstrate strong zero-shot performance and superior cross-domain generalization. For the first time, we demonstrate a generative registration framework that operates successfully on real outdoor LiDAR data, where no imagery data is available.

C-GenReg: 확률적 양식 융합을 통한 다중 뷰 일관성 기하-이미지 생성 기반의 학습 없이 가능한 3D 포인트 클라우드 정합

C-GenReg: Training-Free 3D Point Cloud Registration by Multi-View-Consistent Geometry-to-Image Generation with Probabilistic Modalities Fusion

초록

Support