ChatPaper.aiChatPaper

前馈式场景DINO用于无监督语义场景补全

Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion

July 8, 2025
作者: Aleksandar Jevtić, Christoph Reich, Felix Wimbauer, Oliver Hahn, Christian Rupprecht, Stefan Roth, Daniel Cremers
cs.AI

摘要

语义场景补全(SSC)旨在从单张图像中推断场景的三维几何结构及其语义信息。与以往依赖昂贵真实标注的SSC研究不同,我们探索了无监督环境下的SSC任务。我们提出的新方法——SceneDINO,借鉴了自监督表示学习与二维无监督场景理解的技术,将其应用于SSC。我们的训练过程仅利用多视角一致性自监督,无需任何形式的语义或几何真实标注。给定单张输入图像,SceneDINO以前馈方式推断出三维几何结构及富有表现力的三维DINO特征。通过一种新颖的三维特征蒸馏方法,我们获得了无监督的三维语义信息。在三维与二维无监督场景理解任务中,SceneDINO均达到了最先进的分割精度。对三维特征进行线性探测,其分割精度可与当前有监督的SSC方法相媲美。此外,我们还展示了SceneDINO在领域泛化与多视角一致性方面的优势,为单图像三维场景理解奠定了初步的坚实基础。
English
Semantic scene completion (SSC) aims to infer both the 3D geometry and semantics of a scene from single images. In contrast to prior work on SSC that heavily relies on expensive ground-truth annotations, we approach SSC in an unsupervised setting. Our novel method, SceneDINO, adapts techniques from self-supervised representation learning and 2D unsupervised scene understanding to SSC. Our training exclusively utilizes multi-view consistency self-supervision without any form of semantic or geometric ground truth. Given a single input image, SceneDINO infers the 3D geometry and expressive 3D DINO features in a feed-forward manner. Through a novel 3D feature distillation approach, we obtain unsupervised 3D semantics. In both 3D and 2D unsupervised scene understanding, SceneDINO reaches state-of-the-art segmentation accuracy. Linear probing our 3D features matches the segmentation accuracy of a current supervised SSC approach. Additionally, we showcase the domain generalization and multi-view consistency of SceneDINO, taking the first steps towards a strong foundation for single image 3D scene understanding.
PDF32July 9, 2025