ChatPaper.aiChatPaper

SceneAligner:在真實環境中基於3D的平面圖定位

SceneAligner: 3D-Grounded Floorplan Localization in the Wild

May 21, 2026
作者: Junhyeong Cho, Ruojin Cai, Hadar Averbuch-Elor
cs.AI

摘要

许多公共建築物會提供帶有「您在此處」標示的平面圖,以協助訪客辨識方位。平面圖定位旨在透過計算方式重現此功能,判斷視覺觀測資料在平面圖中的擷取位置。然而,現有方法通常假設受控制的小型環境與精確的向量化平面圖,限制了其在大型建築物及柵格化平面圖中的應用能力。本研究提出一種實際場景下的平面圖定位方法,將該任務建立在重建的三維場景表示上。給定無限制的影像集合,我們的方法會重建重力對齊的三維場景,並將其投影為二維密度圖,作為平面圖的替代表示。接著,平面圖定位被形式化為透過二維相似性轉換,將此替代表示與輸入的平面圖進行對齊。為填補密度圖與建築平面圖之間的外觀差距,我們調整二維基礎模型以學習跨模態對應,並引入一種微調機制,在保持結構一致性的同時促進語義對齊的匹配。大量實驗證明,我們的方法相較於既有方法有顯著改進,即便在極稀疏的設定下(僅使用單一輸入影像)也表現優異。我們的程式碼與資料將公開提供。
English
Many public buildings provide floorplans with a "you are here" indicator to help visitors orient themselves. Floorplan localization seeks to computationally replicate this capability by determining where visual observations were captured within a floorplan. However, existing methods typically assume controlled small-scale environments and precise vectorized floorplans, limiting their ability to operate in large-scale buildings and rasterized floorplans. In this work, we present an approach for performing floorplan localization in the wild by grounding the task in a reconstructed 3D representation of the scene. Given an unconstrained image collection, our method reconstructs a gravity-aligned 3D scene and projects it into a 2D density map that serves as a floorplan proxy. Floorplan localization is then formulated as aligning this proxy with the input floorplan via a 2D similarity transform. To bridge the appearance gap between density maps and architectural floorplans, we adapt a 2D foundation model to learn cross-modal correspondences, introducing a fine-tuning scheme that encourages semantically aligned matches while preserving structural consistency. Extensive experiments demonstrate substantial improvements over prior methods, including in extremely sparse settings with as little as a single input image. Our code and data will be publicly available.