ChatPaper.aiChatPaper

TUN3D:邁向從非固定視角圖像理解真實世界場景

TUN3D: Towards Real-World Scene Understanding from Unposed Images

September 23, 2025
作者: Anton Konushin, Nikita Drozdov, Bulat Gabdullin, Alexey Zakharov, Anna Vorontsova, Danila Rukhovich, Maksim Kolodiazhnyi
cs.AI

摘要

佈局估計與3D物體檢測是室內場景理解中的兩項基礎任務。當二者結合時,能夠創建出既緊湊又語義豐富的場景空間表示。現有方法通常依賴於點雲輸入,這帶來了一個主要限制,因為大多數消費級相機缺乏深度傳感器,而僅依賴視覺數據的情況仍然更為普遍。我們通過TUN3D解決了這一問題,這是首個在多視圖圖像作為輸入的情況下,無需真實相機姿態或深度監督,就能處理真實掃描中聯合佈局估計與3D物體檢測的方法。我們的方法基於輕量級的稀疏卷積骨幹網絡,並採用了兩個專用頭部:一個用於3D物體檢測,另一個用於佈局估計,後者利用了新穎且有效的參數化牆體表示。大量實驗表明,TUN3D在三個具有挑戰性的場景理解基準測試中均達到了最先進的性能:(i) 使用真實點雲,(ii) 使用帶姿態的圖像,以及(iii) 使用無姿態的圖像。在與專門的3D物體檢測方法表現相當的同時,TUN3D在佈局估計方面取得了顯著進展,為整體室內場景理解設立了新的標杆。代碼可在https://github.com/col14m/tun3d 獲取。
English
Layout estimation and 3D object detection are two fundamental tasks in indoor scene understanding. When combined, they enable the creation of a compact yet semantically rich spatial representation of a scene. Existing approaches typically rely on point cloud input, which poses a major limitation since most consumer cameras lack depth sensors and visual-only data remains far more common. We address this issue with TUN3D, the first method that tackles joint layout estimation and 3D object detection in real scans, given multi-view images as input, and does not require ground-truth camera poses or depth supervision. Our approach builds on a lightweight sparse-convolutional backbone and employs two dedicated heads: one for 3D object detection and one for layout estimation, leveraging a novel and effective parametric wall representation. Extensive experiments show that TUN3D achieves state-of-the-art performance across three challenging scene understanding benchmarks: (i) using ground-truth point clouds, (ii) using posed images, and (iii) using unposed images. While performing on par with specialized 3D object detection methods, TUN3D significantly advances layout estimation, setting a new benchmark in holistic indoor scene understanding. Code is available at https://github.com/col14m/tun3d .
PDF142September 29, 2025