Sonata:可靠点表征的自监督学习
Sonata: Self-Supervised Learning of Reliable Point Representations
March 20, 2025
作者: Xiaoyang Wu, Daniel DeTone, Duncan Frost, Tianwei Shen, Chris Xie, Nan Yang, Jakob Engel, Richard Newcombe, Hengshuang Zhao, Julian Straub
cs.AI
摘要
本文探讨了是否存在一种可靠的自监督点云模型,该模型能够通过简单的线性探测应用于多样化的3D任务,即便在数据有限且计算资源最小化的情况下。我们发现,现有的3D自监督学习方法在通过线性探测评估表征质量时表现欠佳。我们推测,这一现象源于我们称之为“几何捷径”的问题,它导致表征坍缩至低层次的空间特征。这一挑战为3D领域所独有,源于点云数据的稀疏特性。我们通过两大策略应对此问题:一是模糊空间信息,二是增强对输入特征的依赖,最终通过自蒸馏技术构建了一个包含14万点云的“Sonata”模型。Sonata虽简洁直观,但其学习到的表征却强大可靠:零样本可视化展示了语义分组能力,并通过最近邻关系展现了卓越的空间推理能力。Sonata在参数和数据效率上表现尤为突出,在ScanNet数据集上的线性探测准确率提升了三倍(从21.8%增至72.5%),且仅用1%的数据就使性能几乎翻倍,超越了以往方法。全面微调进一步推动了3D室内外感知任务的SOTA(当前最优)水平。
English
In this paper, we question whether we have a reliable self-supervised point
cloud model that can be used for diverse 3D tasks via simple linear probing,
even with limited data and minimal computation. We find that existing 3D
self-supervised learning approaches fall short when evaluated on representation
quality through linear probing. We hypothesize that this is due to what we term
the "geometric shortcut", which causes representations to collapse to low-level
spatial features. This challenge is unique to 3D and arises from the sparse
nature of point cloud data. We address it through two key strategies: obscuring
spatial information and enhancing the reliance on input features, ultimately
composing a Sonata of 140k point clouds through self-distillation. Sonata is
simple and intuitive, yet its learned representations are strong and reliable:
zero-shot visualizations demonstrate semantic grouping, alongside strong
spatial reasoning through nearest-neighbor relationships. Sonata demonstrates
exceptional parameter and data efficiency, tripling linear probing accuracy
(from 21.8% to 72.5%) on ScanNet and nearly doubling performance with only 1%
of the data compared to previous approaches. Full fine-tuning further advances
SOTA across both 3D indoor and outdoor perception tasks.Summary
AI-Generated Summary