C-GenReg: 確率的モダリティ融合によるマルチビュー整合的なジオメトリ-画像生成に基づく学習不要の3D点群位置合わせ

要旨

本論文では、世界規模の生成事前分布と登録指向の視覚基盤モデル（VFM）の相補的強みを活用する、学習不要な3D点群登録フレームワーク「C-GenReg」を提案する。現在の学習ベースの3D点群登録手法は、センシングモダリティ、サンプリングの差異、環境を跨いだ汎化に課題を抱えている。そこでC-GenRegは、幾何学的点群登録の処理分岐を拡張し、マッチング問題をVFMが優れた性能を発揮する補助的な画像領域に転移させる。具体的には、World Foundation Modelを用いて入力ジオメトリからマルチビュー整合性のあるRGB表現を合成する。この生成的転移は、微調整を一切必要とせず、ソースビューとターゲットビュー間の空間的一貫性を保持する。生成されたビューから、密な対応点検出のために事前学習されたVFMがマッチングを抽出する。得られた画素対応は、元の深度マップを介して3D空間に逆投影される。頑健性をさらに高めるため、2つの独立した対応事後分布、すなわち生成RGB分岐のものと生の幾何学分岐のものを結合する「Match-then-Fuse」確率的コールドフュージョン手法を導入する。この原理に基づく融合は、各モダリティの帰納的バイアスを保持し、追加の学習なしで較正された信頼度を提供する。C-GenRegはゼロショットかつプラグアンドプレイである。すべてのモジュールは事前学習済みであり、微調整なしで動作する。室内（3DMatch, ScanNet）および屋外（Waymo）ベンチマークにおける広範な実験により、強力なゼロショット性能と優れたクロスドメイン汎化性能が実証された。画像データが一切利用できない実環境の屋外LiDARデータにおいて、生成型登録フレームワークが初めて成功裏に動作することを示す。

English

We introduce C-GenReg, a training-free framework for 3D point cloud registration that leverages the complementary strengths of world-scale generative priors and registration-oriented Vision Foundation Models (VFMs). Current learning-based 3D point cloud registration methods struggle to generalize across sensing modalities, sampling differences, and environments. Hence, C-GenReg augments the geometric point cloud registration branch by transferring the matching problem into an auxiliary image domain, where VFMs excel, using a World Foundation Model to synthesize multi-view-consistent RGB representations from the input geometry. This generative transfer, preserves spatial coherence across source and target views without any fine-tuning. From these generated views, a VFM pretrained for finding dense correspondences extracts matches. The resulting pixel correspondences are lifted back to 3D via the original depth maps. To further enhance robustness, we introduce a "Match-then-Fuse" probabilistic cold-fusion scheme that combines two independent correspondence posteriors, that of the generated-RGB branch with that of the raw geometric branch. This principled fusion preserves each modality inductive bias and provides calibrated confidence without any additional learning. C-GenReg is zero-shot and plug-and-play: all modules are pretrained and operate without fine-tuning. Extensive experiments on indoor (3DMatch, ScanNet) and outdoor (Waymo) benchmarks demonstrate strong zero-shot performance and superior cross-domain generalization. For the first time, we demonstrate a generative registration framework that operates successfully on real outdoor LiDAR data, where no imagery data is available.

C-GenReg: 確率的モダリティ融合によるマルチビュー整合的なジオメトリ-画像生成に基づく学習不要の3D点群位置合わせ

C-GenReg: Training-Free 3D Point Cloud Registration by Multi-View-Consistent Geometry-to-Image Generation with Probabilistic Modalities Fusion

要旨

Support