迈向更丰富且具挑战性的点云学习预训练：基于解耦视角的自监督交叉重建

摘要

点云学习，尤其是在无需人工标签的自监督方式下，因其在广泛应用中的潜在价值，已在视觉与学习领域引起了越来越多的关注。现有的大多数点云自监督学习生成方法主要集中于从单一视角下的可见点恢复被遮挡的点。认识到双视角预训练范式本质上引入了更大的多样性和变化性，因此可能实现更具挑战性和信息量的预训练。受此启发，我们探索了双视角学习在这一领域的潜力。本文中，我们提出了Point-PQAE，一种交叉重建生成范式，首先生成两个解耦的点云/视图，然后从一个视图重建另一个。为实现这一目标，我们首次开发了一种用于点云视图生成的裁剪机制，并进一步提出了一种新颖的位置编码来表示两个解耦视图之间的三维相对位置。与自重建相比，交叉重建显著增加了预训练的难度，使我们的方法在三维自监督学习中超越了以往的单模态自重建方法。具体而言，在ScanObjectNN的三个变体上，采用Mlp-Linear评估协议，我们的方法分别比自重建基线（Point-MAE）高出6.5%、7.0%和6.7%。代码可在https://github.com/aHapBean/Point-PQAE获取。

English

Point cloud learning, especially in a self-supervised way without manual labels, has gained growing attention in both vision and learning communities due to its potential utility in a wide range of applications. Most existing generative approaches for point cloud self-supervised learning focus on recovering masked points from visible ones within a single view. Recognizing that a two-view pre-training paradigm inherently introduces greater diversity and variance, it may thus enable more challenging and informative pre-training. Inspired by this, we explore the potential of two-view learning in this domain. In this paper, we propose Point-PQAE, a cross-reconstruction generative paradigm that first generates two decoupled point clouds/views and then reconstructs one from the other. To achieve this goal, we develop a crop mechanism for point cloud view generation for the first time and further propose a novel positional encoding to represent the 3D relative position between the two decoupled views. The cross-reconstruction significantly increases the difficulty of pre-training compared to self-reconstruction, which enables our method to surpass previous single-modal self-reconstruction methods in 3D self-supervised learning. Specifically, it outperforms the self-reconstruction baseline (Point-MAE) by 6.5%, 7.0%, and 6.7% in three variants of ScanObjectNN with the Mlp-Linear evaluation protocol. The code is available at https://github.com/aHapBean/Point-PQAE.

迈向更丰富且具挑战性的点云学习预训练：基于解耦视角的自监督交叉重建

Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views

摘要

Support